[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 07/17] IOMMU/x86: restrict IO-APIC mappings for PV Dom0


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 8 Sep 2021 11:44:00 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=4/2nMHtwUkwOJeNn+c/1PSvVLkll4mq/pLnUB8brkF0=; b=TK4azVDcKoNqk8E2/AF/Nqek1lbq6kfowYhzQaSDjciL7rGFCp8jaZDCmWpuF5hA25QCGlNiZL8qEm3pYXc2EK581BKcp4bhFGPyxQHX+9rHNCLaXJ7jFS3QfB9X0quKEenvL2ZZmjDmf/4EmEjdeTumUIdj/G2WdXN+mgwyuAm0mBONrRj4kJXXVtBWCeFxhjp2LqqXl+tncTkKMs4GwYm7rkOWzMB51QFJMVZFnjb3oKyROu91xmrebvmZIO1iX6IvNIggbvheChJSJmAwmyOEC4joASdjndhHN4+MJxMhtv0w2HH4sSewcG9RLDMO+iqPhkC9SozLgWnRpaGByQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YJzrp2WC8FcsRVS8SkRMENAVVyAVepLuQEYpi0uJq/6kBn/A58CkoIJG3FU2Ih1QGci+m2N7d9E39kblkCVpAhbIwc78e1V6odyOSR8mwNrPzko7DGr0Fpo3rM9lnBNQqp30e2sS9XT2iwQiHuh1wPXeX6cz2XGaWhlElchc1xMW707wazzOT0PqfZlqME9YcFfJWiJVKKJ2GYfuho4LVpoIzOreS8sft4RHzhDoxp2lbyPwe8ExAj4zeQqKC03oFuu0guFfG+Uz+Ir55gEUGPF5yEKeDE0pxKvN2AqpVu2jnwo7cFOHEGC4aU4+W9UvQFh8rzay2sovj90Zh1F1MA==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: Paul Durrant <paul@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 08 Sep 2021 09:44:17 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.09.2021 19:13, Andrew Cooper wrote:
> On 26/08/2021 13:55, Jan Beulich wrote:
>> On 26.08.2021 13:57, Andrew Cooper wrote:
>>> On 24/08/2021 15:21, Jan Beulich wrote:
>>>> While already the case for PVH, there's no reason to treat PV
>>>> differently here, though of course the addresses get taken from another
>>>> source in this case. Except that, to match CPU side mappings, by default
>>>> we permit r/o ones. This then also means we now deal consistently with
>>>> IO-APICs whose MMIO is or is not covered by E820 reserved regions.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>> Why do we give PV dom0 a mapping of the IO-APIC?  Having thought about
>>> it, it cannot possibly be usable.
>>>
>>> IO-APICs use a indirect window, and Xen doesn't perform any
>>> write-emulation (that I can see), so the guest can't even read the data
>>> register and work out which register it represents.  It also can't do an
>>> atomic 64bit read across the index and data registers, as that is
>>> explicitly undefined behaviour in the IO-APIC spec.
>>>
>>> On the other hand, we do have PHYSDEVOP_apic_{read,write} which, despite
>>> the name, is for the IO-APIC not the LAPIC, and I bet these hypercalls
>>> where introduced upon discovering that a read-only mapping of the
>>> IO-APIC it totally useless.
>>>
>>> I think we can safely not expose the IO-APICs to PV dom0 at all, which
>>> simplifies the memory handling further.
>> The reason we do expose it r/o is that some ACPI implementations access
>> (read and write) some RTEs from AML. If we don't allow Dom0 to map the
>> IO-APIC, Dom0 will typically fail to boot on such systems.
> 
> I think more details are needed.

How do you expect to collect the necessary info without having an affected
system to test? I see close to zero chance to locate the old reports (and
possible discussion) via web search.

> If AML is reading the RTEs, it's is also writing to the index register.

Quite likely, yes. Albeit this being broken to a fair degree in the
first place, ...

> Irrespective of Xen, doing this behind the back of the OS cannot work
> safely, because at a minimum the ACPI interpreter would need to take the
> ioapic lock, and I see no evidence of workarounds like this in Linux.

... as you indicate you think (as much as I do), leaves room for the
actual accesses to also be flawed (and hence meaningless in the first
place). I do recall looking at the disassembled AML back at the time,
but I do not recall any details for sure. What I seem to vaguely recall
is that their whole purpose was to set the mask bit in an RTE (I think
to work around the dual routing issue, and I assume in turn to work
around missing workarounds in certain OSes). For that the current
approach as well as the alternative one you suggest below would be
equally "good enough".

> In Xen, we appear to swallow writes to mmio_ro ranges which is rude, and
> causes the associated reads to read garbage.  This is Xen-induced memory
> corruption inside dom0.
> 
> 
> I don't think any of this is legitimate behaviour.  ACPI needs disabling
> on such systems, or interlocking suitably, and its not just IO-APICs
> which are problematic - any windowed I/O (e.g. RTC) as well as any other
> device with read side effects.

I don't think disabling ACPI on such systems would be a viable option.
Things tend to not work very well that way ... Plus iirc these issues
weren't exclusively on some random no-name systems, but in at least one
of the cases on ones of a pretty large vendor.

> I think we should seriously consider not mapping the IO-APIC at all. 

We can easily do so on the IOMMU side, if you agree to have CPU and
IOMMU mappings diverge for this case. Since the behavior is PV-
specific anyway, there are also no concerns as to differing behavior
with vs without shared page tables (on PVH).

> That said, I would be surprised if Linux is cleanly avoiding the
> IO-APIC, so a slightly less bad alternative is to redirect to an "MMIO"
> frame of ~0's which we still write-discard on top of.
> 
> That at least makes the Xen-induced memory corruption behave
> deterministically.

It would work more deterministically, yes, but without such a system
available to test we wouldn't know whether that approach would actually
make things work (sufficiently). Whereas for the current approach we do
know (from the testing done back at the time).

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.