[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] AMD/IOMMU: revert "amd/iommu: assign iommu devices to Xen"



>>> On 04.06.19 at 18:20, <roger.pau@xxxxxxxxxx> wrote:
> On Tue, Jun 04, 2019 at 07:02:06AM -0600, Jan Beulich wrote:
>> >>> On 04.06.19 at 10:48, <roger.pau@xxxxxxxxxx> wrote:
>> > On Mon, Jun 03, 2019 at 07:00:25AM -0600, Jan Beulich wrote:
>> >> This reverts commit b6bd02b7a877f9fac2de69e64d8245d56f92ab25. The change
>> >> was redundant with amd_iommu_detect_one_acpi() already calling
>> >> pci_ro_device().
>> >> 
>> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> > 
>> > I think this needs to be squashed together with your `AMD/IOMMU: don't
>> > "add" IOMMUs` patch, or else PVH dom0 will break because
>> > update_paging_mode will find devices not behind an IOMMU assigned to
>> > dom0, thus returning an error and crashing dom0.
>> 
>> I've taken another look (while correcting the other patch, now sent
>> as v2): update_paging_mode() iterates over all devices the domain
>> owns. The IOMMU ones, having been subject of an early
>> pci_ro_device(), should never end up on Dom0's list. And looking at
>> the code I also can't - for now at least - see how they could get
>> moved there. In fact I've verified that they take the "override"
>> path in _setup_hwdom_pci_devices().
> 
> As you realized this commit was indeed papering over an existing issue
> elsewhere. When I did this patch I didn't realize
> amd_iommu_detect_one_acpi was calling pci_ro_device.
> 
> The issue is that when pci_ro_device gets called by
> amd_iommu_detect_one_acpi dom_xen has not been created yet, so
> pdev->domain ends up being NULL.

Well, that's being fixed by "adjust system domain creation (and call it
earlier on x86)" (note that it's "special" rather than "system" in the
posted version). pdev->domain remaining to be NULL really is the
smaller of the problems; accessing dom_xen->arch.pdev_list is the
worse part here.

One thing is curious though: So far I thought I would have screwed
up things by having amd_iommu_detect_one_acpi() called earlier,
as mentioned in that patch's description. Your change that I'd like
to revert predates that though by several months, so I'm getting
the impression the issue has been older. As a result the range of
versions to backport this to may have to grow.

> On a tangential note, there's also a dereference of dom_xen in
> _pci_hide_device which doesn't trigger a page fault. Do we have
> something mapped at linear address 0 on purpose?

Yes, during early (legacy) boot. That's how the initial page tables
get constructed. And I did notice it as an actual crash because the
newer box boots from EFI, where there's no mapping at linear
address 0. Hence that patch mentioned above.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.