[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: i915 dma faults on Xen



On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>
> On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote:
> >
> > The RMRRs are:
> > (XEN) [VT-D]Host address width 39
> > (XEN) [VT-D]found ACPI_DMAR_DRHD:
> > (XEN) [VT-D]  dmaru->address = fed90000
> > (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000
> > (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e
> > (XEN) [VT-D] endpoint: 0000:00:02.0
> > (XEN) [VT-D]found ACPI_DMAR_DRHD:
> > (XEN) [VT-D]  dmaru->address = fed91000
> > (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000
> > (XEN) [VT-D]cap = d2008c40660462 ecap = f050da
> > (XEN) [VT-D] IOAPIC: 0000:00:1e.7
> > (XEN) [VT-D] MSI HPET: 0000:00:1e.6
> > (XEN) [VT-D]  flags: INCLUDE_ALL
> > (XEN) [VT-D]found ACPI_DMAR_RMRR:
> > (XEN) [VT-D] endpoint: 0000:00:14.0
> > (XEN) [VT-D]dmar.c:615:   RMRR region: base_addr 78863000 end_addr 78882fff
> > (XEN) [VT-D]found ACPI_DMAR_RMRR:
> > (XEN) [VT-D] endpoint: 0000:00:02.0
> > (XEN) [VT-D]dmar.c:615:   RMRR region: base_addr 7d000000 end_addr 7f7fffff
> > (XEN) [VT-D]found ACPI_DMAR_RMRR:
> > (XEN) [VT-D] endpoint: 0000:00:16.7
> > (XEN) [VT-D]dmar.c:581:  Non-existent device (0000:00:16.7) is
> > reported in RMRR (78907000, 78986fff)'s scope!
> > (XEN) [VT-D]dmar.c:596:   Ignore the RMRR (78907000, 78986fff) due to
>
> This is also part of a reserved region, so should be added to the
> iommu page tables anyway regardless of this message.

I wonder if this is for the Intel AMT PCI device?  I assumed it is
disabled, but I actually can't find it listed in the BIOS
configuration to verify.

> > devices under its scope are not PCI discoverable!
> >
> > > > I agree.
> > > >
> > > > Can you paste the memory map as printed by Xen when booting, and what
> > > > command line are you using to boot Xen.
> > >
> > > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen
> > >
> > > There's the memory map
> > > (XEN) TBOOT RAM map:
> > > (XEN)  0000000000000000 - 0000000000060000 (usable)
> > > (XEN)  0000000000060000 - 0000000000068000 (reserved)
> > > (XEN)  0000000000068000 - 000000000009e000 (usable)
> > > (XEN)  000000000009e000 - 000000000009f000 (reserved)
> > > (XEN)  000000000009f000 - 00000000000a0000 (usable)
> > > (XEN)  00000000000a0000 - 0000000000100000 (reserved)
> > > (XEN)  0000000000100000 - 0000000040000000 (usable)
> > > (XEN)  0000000040000000 - 0000000040400000 (reserved)
> > > (XEN)  0000000040400000 - 000000007024b000 (usable)
> > > (XEN)  000000007024b000 - 000000007024c000 (ACPI NVS)
> > > (XEN)  000000007024c000 - 000000007024d000 (reserved)
> > > (XEN)  000000007024d000 - 0000000077f19000 (usable)
> > > (XEN)  0000000077f19000 - 0000000078987000 (reserved)
> > > (XEN)  0000000078987000 - 0000000078a04000 (ACPI data)
> > > (XEN)  0000000078a04000 - 0000000078ea3000 (ACPI NVS)
> > > (XEN)  0000000078ea3000 - 000000007acff000 (reserved)
> > > (XEN)  000000007acff000 - 000000007ad00000 (usable)
> > > (XEN)  000000007ad00000 - 000000007f800000 (reserved)
> > > (XEN)  00000000f0000000 - 00000000f8000000 (reserved)
> > > (XEN)  00000000fe000000 - 00000000fe011000 (reserved)
> > > (XEN)  00000000fec00000 - 00000000fec01000 (reserved)
> > > (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
> > > (XEN)  00000000ff000000 - 0000000100000000 (reserved)
> > > (XEN)  0000000100000000 - 000000047c800000 (usable)
> > > (XEN) EFI memory map:
> > > (XEN)  0000000000000-000000009dfff type=7 attr=000000000000000f
> > > (XEN)  000000009e000-000000009efff type=0 attr=000000000000000f
> > > (XEN)  000000009f000-000000009ffff type=3 attr=000000000000000f
> > > (XEN)  0000000100000-000003fffffff type=7 attr=000000000000000f
> > > (XEN)  0000040000000-00000403fffff type=0 attr=000000000000000f
> > > (XEN)  0000040400000-000005e359fff type=7 attr=000000000000000f
> > > (XEN)  000005e35a000-000005e399fff type=4 attr=000000000000000f
> > > (XEN)  000005e39a000-000006a47dfff type=7 attr=000000000000000f
> > > (XEN)  000006a47e000-000006c3eefff type=2 attr=000000000000000f
> > > (XEN)  000006c3ef000-000006d5eefff type=1 attr=000000000000000f
> > > (XEN)  000006d5ef000-000006d86cfff type=2 attr=000000000000000f
> > > (XEN)  000006d86d000-000006d978fff type=1 attr=000000000000000f
> > > (XEN)  000006d979000-000006dc7afff type=4 attr=000000000000000f
> > > (XEN)  000006dc7b000-000006dc98fff type=3 attr=000000000000000f
> > > (XEN)  000006dc99000-000006dcc7fff type=4 attr=000000000000000f
> > > (XEN)  000006dcc8000-000006dccdfff type=3 attr=000000000000000f
> > > (XEN)  000006dcce000-00000701a5fff type=4 attr=000000000000000f
> > > (XEN)  00000701a6000-00000701c8fff type=3 attr=000000000000000f
> > > (XEN)  00000701c9000-00000701edfff type=4 attr=000000000000000f
> > > (XEN)  00000701ee000-0000070204fff type=3 attr=000000000000000f
> > > (XEN)  0000070205000-000007022cfff type=4 attr=000000000000000f
> > > (XEN)  000007022d000-000007024afff type=3 attr=000000000000000f
> > > (XEN)  000007024b000-000007024bfff type=10 attr=000000000000000f
> > > (XEN)  000007024c000-000007024cfff type=6 attr=800000000000000f
> > > (XEN)  000007024d000-000007024dfff type=4 attr=000000000000000f
> > > (XEN)  000007024e000-0000070282fff type=3 attr=000000000000000f
> > > (XEN)  0000070283000-00000702c3fff type=4 attr=000000000000000f
> > > (XEN)  00000702c4000-00000702c8fff type=3 attr=000000000000000f
> > > (XEN)  00000702c9000-00000702defff type=4 attr=000000000000000f
> > > (XEN)  00000702df000-0000070307fff type=3 attr=000000000000000f
> > > (XEN)  0000070308000-0000070317fff type=4 attr=000000000000000f
> > > (XEN)  0000070318000-0000070319fff type=3 attr=000000000000000f
> > > (XEN)  000007031a000-0000070331fff type=4 attr=000000000000000f
> > > (XEN)  0000070332000-0000070349fff type=3 attr=000000000000000f
> > > (XEN)  000007034a000-0000070356fff type=2 attr=000000000000000f
> > > (XEN)  0000070357000-0000070357fff type=7 attr=000000000000000f
> > > (XEN)  0000070358000-0000070358fff type=2 attr=000000000000000f
> > > (XEN)  0000070359000-0000076f3efff type=4 attr=000000000000000f
> > > (XEN)  0000076f3f000-00000772affff type=7 attr=000000000000000f
> > > (XEN)  00000772b0000-0000077f18fff type=3 attr=000000000000000f
> > > (XEN)  0000077f19000-0000078986fff type=0 attr=000000000000000f
> > > (XEN)  0000078987000-0000078a03fff type=9 attr=000000000000000f
> > > (XEN)  0000078a04000-0000078ea2fff type=10 attr=000000000000000f
> > > (XEN)  0000078ea3000-000007ab22fff type=6 attr=800000000000000f
> > > (XEN)  000007ab23000-000007acfefff type=5 attr=800000000000000f
> > > (XEN)  000007acff000-000007acfffff type=4 attr=000000000000000f
> > > (XEN)  0000100000000-000047c7fffff type=7 attr=000000000000000f
> > > (XEN)  00000000a0000-00000000fffff type=0 attr=0000000000000000
> > > (XEN)  000007ad00000-000007adfffff type=0 attr=070000000000000f
> > > (XEN)  000007ae00000-000007f7fffff type=0 attr=0000000000000000
> > > (XEN)  00000f0000000-00000f7ffffff type=11 attr=800000000000100d
> > > (XEN)  00000fe000000-00000fe010fff type=11 attr=8000000000000001
> > > (XEN)  00000fec00000-00000fec00fff type=11 attr=8000000000000001
> > > (XEN)  00000fee00000-00000fee00fff type=11 attr=8000000000000001
> > > (XEN)  00000ff000000-00000ffffffff type=11 attr=800000000000100d
> > >
> > > Command line
> > > console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc
> > > com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug
> > > guest_loglvl=debug smt=0 ucode=-1 bootscrub=1
> > > argo=yes,mac-permissive=1 iommu=force,igfx
> > >
> > > iommu=force,igfx was to force igfx back on.  I added a dmi quirk to
> > > set no-igfx on this platform as a temporary workaround.
>
> I assume setting no-igfx fixed the issue and the card works fine in
> that case?

Yes, it seems to work.  The internal and 2 external monitors are
displaying and seem okay.  If I unplug the dock with those 2 displays,
then go plug in a different dock with a different monitor, I've seen
(unclear how often) the i915 report errors with configuring it's
"pipe" and the built in display (eDP) is black.  But it may recover
sometimes?

> > > > Have you tried adding dom0-iommu=map-inclusive to the Xen command
> > > > line?
> >
> > Still seeing faults with dom0-iommu=map-inclusive.  At a different
> > address this time:
> > Oct 16 15:58:05.110768 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > Request device [0000:00:02.0] fault addr ea0c4f000, iommu reg = ffff
>
> That's also past the end of RAM.
>
> > 82c00021d000
> > Oct 16 15:58:05.110774 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > PTE Read access is not set
> > Oct 16 15:58:05.110777 VM hypervisor: (XEN) print_vtd_entries: iommu
> > #0 dev 0000:00:02.0 gmfn ea0c4f
> > Oct 16 15:58:05.110780 VM hypervisor: (XEN)     root_entry[00] = 46e129001
> > Oct 16 15:58:05.110782 VM hypervisor: (XEN)     context[10] = 2_46e128001
> > Oct 16 15:58:05.110785 VM hypervisor: (XEN)     l4[000] = 46e11b003
> > Oct 16 15:58:05.110787 VM hypervisor: (XEN)     l3[03a] = 0
> > Oct 16 15:58:05.110789 VM hypervisor: (XEN)     l3[03a] not present
> >
> > The previous posting, the two faulting addresses repeated in pairs.
> > Here it is only this one address repeating.
> >
> > I plugged and unplugged and a different address was repeating with a
> > few other random addresses with 1 or 2 faults.  Here is uniq -c output
> > of the address and count pulled from the logs:
> > 0x1ce9d6b000 2007
> > 0x31b50d5000 1
> > 0x1ce9d6b000 882
> > 0x707741000 1
> > 0x1ce9d6b000 1114
> > 0x20d2099000 1
> > 0x1ce9d6b000 3489
> > 0xeb98eb000 1
> > 0x1ce9d6b000 2430
> > 0xeb98eb000 1
> > 0x1ce9d6b000 1300
> > 0x22f20bb000 1
> > 0x1ce9d6b000 269
> > 0x22f20bb000 1
> > 0x1ce9d6b000 5091
> > 0x6c99ec9000 1
> > 0x1ce9d6b000 29
> > 0xeb98eb000 1
> > 0x1ce9d6b000 4599
> > 0x6c99ec9000 1
> > 0x1ce9d6b000 1989
>
> Hm, it's hard to tell what's going on. My limited experience with
> IOMMU faults on broken systems there's a small range that initially
> triggers those, and then the device goes wonky and starts accessing a
> whole load of invalid addresses.
>
> You could try adding those manually using the rmrr Xen command line
> option [0], maybe you can figure out which range(s) are missing?

They seem to change, so it's hard to know.  Would there be harm in
adding one to cover the end of RAM ( 0x04,7c80,0000 ) to (
0xff,ffff,ffff )?  Maybe that would just quiet the pointless faults
while leaving the IOMMU enabled?

Thanks for taking a look.

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.