[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bug: Limitation of <=2GB RAM in domU persists with 4.3.0



On 07/25/2013 10:48 PM, Gordan Bobic wrote:
On 07/25/2013 08:18 PM, George Dunlap wrote:
On Wed, Jul 24, 2013 at 11:15 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
Attached are the logs (loglvl=all) and configs for 2GB (working) and 8GB
(screen corruption + domU crash + sometimes dom0 crashing with it).

I can see in the xl-dmesg log in 8GB case that there is memory remapping
going on to allow for the lowmem MMIO hole, but it doesn't seem to help.

There's a possibility that it's actually got nothing to do with
relocation, but with bugs in your hardware.

That wouldn't surprise me at all, unfortunately. :(

Can you try:
* Set the guest memory to 3600
* Boot the guest, and check to make sure that xl dmesg shows does
*not* relocate memory?
* Report whether it crashes?

xl dmesg from booting a Linux domU with 3600MB is attached.
The crash is never immediate, both Linux and Windows boot fine. But when
a large 3D application like a game loads, there is frame buffer
corruption immediately visible, and the domU will typically lock up some
seconds later. Infrequently, it will take the host down with it.

If it's a bug in the hardware, I would expect to see that memory was
not relocated, but that the system will lock up anyway.

That is indeed what seems to happen - the memory map looks OK with no
overlaps between PCI memory and ROM ranges and the usable or reserved
e820 regions.

Can you also do lspci -vvv in dom0 before assigning the device and
attach the output?

I have attached it, but not before assigning - I'll need to reboot for
that. Do you expect there to be a difference in mapping in dom0 before
and after assigning the device to domU?

The hardware bug we've seen is this: In order for the IOMMU to work
properly, *all* DMA transactions must be passed up to the root bridge
so the IOMMU can translate the addresses from guest address to host
address.  Unfortunately, an awful lot of bridges will not do this
properly, which means that the address is not translated properly,
which means that if a *guest* memory address overlaps the a *host*
MMIO range, badness ensues.

Hmm, looking at xl dmesg vs dom0 lspci, that does appear to be the case:

xl dmesg:
(XEN) HVM24: E820 table:
(XEN) HVM24:  [00]: 00000000:00000000 - 00000000:0009e000: RAM
(XEN) HVM24:  [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
(XEN) HVM24:  HOLE: 00000000:000a0000 - 00000000:000e0000
(XEN) HVM24:  [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
(XEN) HVM24:  [03]: 00000000:00100000 - 00000000:e0000000: RAM
(XEN) HVM24:  HOLE: 00000000:e0000000 - 00000000:fc000000
(XEN) HVM24:  [04]: 00000000:fc000000 - 00000001:00000000: RESERVED
(XEN) HVM24:  [05]: 00000001:00000000 - 00000001:00800000: RAM

lspci:
08:00.0 VGA compatible controller: nVidia Corporation GF100
         Region 0: Memory at f8000000 (32-bit, non-prefetchable)
[disabled] [size=32M]
         Region 1: Memory at b8000000 (64-bit, prefetchable) [disabled]
[size=128M]
         Region 3: Memory at b4000000 (64-bit, prefetchable) [disabled]
[size=64M]

Unless I'm reading this wrong, it means that physical GPU region 0 is in
the domU reserved area, and GPU regions 1 and 2 and in the domU RAM area.

b4000000 = 2880MB

Correction - my other GPU has a BAR mapped lower, at 0xa8000000 which is 2688MB. So I upped my memory mapping to 2688MB, and lo and behold, that doesn't crash and games work just fine without frame buffer getting corrupted.

Now, if I am understanding the basic nature of the problem correctly, this _could_ be worked around by ensuring that vBAR = pBAR since in that case there is no room for the mis-mapped memory overwrites to occur. Is that correct?

I guess I could test this easily enough by applying the vBAR = pBAR hack.

Gordan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.