[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bug: Limitation of <=2GB RAM in domU persists with 4.3.0



On 07/25/2013 08:18 PM, George Dunlap wrote:
On Wed, Jul 24, 2013 at 11:15 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
Attached are the logs (loglvl=all) and configs for 2GB (working) and 8GB
(screen corruption + domU crash + sometimes dom0 crashing with it).

I can see in the xl-dmesg log in 8GB case that there is memory remapping
going on to allow for the lowmem MMIO hole, but it doesn't seem to help.

There's a possibility that it's actually got nothing to do with
relocation, but with bugs in your hardware.

That wouldn't surprise me at all, unfortunately. :(

Can you try:
* Set the guest memory to 3600
* Boot the guest, and check to make sure that xl dmesg shows does
*not* relocate memory?
* Report whether it crashes?

xl dmesg from booting a Linux domU with 3600MB is attached.
The crash is never immediate, both Linux and Windows boot fine. But when a large 3D application like a game loads, there is frame buffer corruption immediately visible, and the domU will typically lock up some seconds later. Infrequently, it will take the host down with it.

If it's a bug in the hardware, I would expect to see that memory was
not relocated, but that the system will lock up anyway.

That is indeed what seems to happen - the memory map looks OK with no overlaps between PCI memory and ROM ranges and the usable or reserved e820 regions.

Can you also do lspci -vvv in dom0 before assigning the device and
attach the output?

I have attached it, but not before assigning - I'll need to reboot for that. Do you expect there to be a difference in mapping in dom0 before and after assigning the device to domU?

The hardware bug we've seen is this: In order for the IOMMU to work
properly, *all* DMA transactions must be passed up to the root bridge
so the IOMMU can translate the addresses from guest address to host
address.  Unfortunately, an awful lot of bridges will not do this
properly, which means that the address is not translated properly,
which means that if a *guest* memory address overlaps the a *host*
MMIO range, badness ensues.

Hmm, looking at xl dmesg vs dom0 lspci, that does appear to be the case:

xl dmesg:
(XEN) HVM24: E820 table:
(XEN) HVM24:  [00]: 00000000:00000000 - 00000000:0009e000: RAM
(XEN) HVM24:  [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
(XEN) HVM24:  HOLE: 00000000:000a0000 - 00000000:000e0000
(XEN) HVM24:  [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
(XEN) HVM24:  [03]: 00000000:00100000 - 00000000:e0000000: RAM
(XEN) HVM24:  HOLE: 00000000:e0000000 - 00000000:fc000000
(XEN) HVM24:  [04]: 00000000:fc000000 - 00000001:00000000: RESERVED
(XEN) HVM24:  [05]: 00000001:00000000 - 00000001:00800000: RAM

lspci:
08:00.0 VGA compatible controller: nVidia Corporation GF100
Region 0: Memory at f8000000 (32-bit, non-prefetchable) [disabled] [size=32M] Region 1: Memory at b8000000 (64-bit, prefetchable) [disabled] [size=128M] Region 3: Memory at b4000000 (64-bit, prefetchable) [disabled] [size=64M]

Unless I'm reading this wrong, it means that physical GPU region 0 is in the domU reserved area, and GPU regions 1 and 2 and in the domU RAM area.

b4000000 = 2880MB

So in theory, that might mean that I should be able to get away with up to 2880MB of RAM for domU without encountering frame buffer corruption and the crash. I will test this shortly.

There's nothing we can do about this in
Xen other than make the guest MMIO hole the same size as the host MMIO
hole.

Not sure I follow. Do you mean make it so that pBAR = vBAR?

Gordan

Attachment: xl-dmesg5
Description: Text document

Attachment: dom0-lspci
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.