[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM support for e820_host (Was: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0)



On Fri, 6 Sep 2013 10:32:23 -0400, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:

>>On the face of it, that's actually fine - my PCI IOMEM mappings show
>>the lowest mapping (according to lspci -vvv) starts at a8000000,
>
><surprise>

Indeed - on the host, the hole is 1GB-4GB, but there is no IOMEM
mapped between 1024M and 2688MB. Hence why I can get away with a
domU memory allocation up to 2688MB.

When you say 'IOMEM' you mean /proc/iomem output?

I mean what lspci shows WRT where PCI device memory regions
are mapped.

>>explain what is actually going wrong and why the crash is still
>>occuring - unless some other piece of hardware is having it's domU
>>IOMEM mapped somewhere in the range f3df4000-fec8b000 and that is
>>causing a memory overwrite.
>>
>>I am just not seeing any obvious memory stomp at the moment...
>
>Neither am I.

I may have pasted the wrong domU e820. I have a sneaky suspicion
that this above map was from a domU with 2688MB of RAM assigned,
hence why there is on domU RAM in the map above a7800000. I'll
re-check when I'm in front of that machine again.

Are you OK with the plan to _only_ copy the holes from host E820
to the hvmloader E820? I think this would be sufficient and not
cause any undue problems. The only things that would need to
change are:
1) Enlarge the domU hole
2) Do something with the top reserved block, starting at
RESERVED_MEMBASE=0xFC000000. What is this actually for? It
overlaps with the host memory hole which extends all the way up
to 0xfee00000. If it must be where it is, this could be
problematic. What to do in this case?

I would do a git log or git annotate to find it. I recall
some patches to move that - but I can't recall the details.

Will do. But what could this possibly be for?

So would it perhaps be neater, easier, more consistent and
more debuggable to just make the hvmloader put in a hole
between 0x40000000-0xffffffff (the whole 3GB) by default?
Or is that deemed to be too crippling for 32-bit non-PAE
domUs (and are there enough of these aroudn to matter?)?

Correct. Also it would wreak havoc when migrating to other
hvmloader's which have a different layout.

Two points here that might just be worth pointing out here:

1) domUs with e820_host set aren't migratable anyway
(including PV ones for which e820_host is currently
implemented)

2) All of this is conditional on e820_host=1 being set
in the config. Since legacy hosts won't have this set
anyway (since it isn't implemented, and won't be until
this patch set is completed), surely any notion of
backward compatibility for HVMs with e820_host=1 set
is null and void.

Thus - as a first pass solution that would work in
most cases where this option is useful in the first
place, setting the low RAM limit to the beginning of
the first memory hole above 0x100000 (1MB) should be
OK.

Leave anything after that unmapped (that seems to
be what shows up as "HOLE" on the dumps) all the
way up to RESERVED_MEMBASE.

That would only leave the question of what it is
(if anything) that uses the memory between
RESERVED_MEMBASE and 0xffffffff (4GB) and under
which circumstances. This could be somewhat important
because 0xfec8a000 -> +4KB on my machine is actually
the Intel I/O APIC. If it is reserved and nothing uses
it, no problem, it can stay as is. If SeaBIOS or similar
is known to write to it under some circumstances, that
could easily be quite crashtastic.

Caveat - this alone wouldn't cover any other weirdness such as
the odd memory hole 0x3f7e0000-0x3f7e7000 on my hardware. Was
this what you were thinking about when asking whether my domUs
work OK with 1GB of RAM? Since that is just under the 1GB
limit.

So there are some issues with i915 IGD having to have a 'flush
page'. Mainly some non-RAM region that they can tell the IGD
to flush its pages. And it had to be non-RAM and somehow
via magic IGD registers you can program the physical address
in the card - so the card has it remapped to itself.

Usually it is some gap (aka hole) that ends has to be
faithfully reproduced in the guest. But you are using
nvidia and are not playing  those nasty tricks.

Mere a different set of nasty tricks instead. :)
But yes, on the whole, I agree. I will try to get the holes
as similar as possible for a "production" level patch.

To clarify, I am not suggesting just hard coding a 3GB memory
hole - I am suggesting defaulting to at least that and them
mapping in any additional memory holes as well. My reasoning
behind this suggestion is that it would make things more
consistent between different (possibly dissimilar) hosts.

Potentially. The other option when thinking about migration
and PCI - is to interogate _All_ of the hosts that will be involved
in the migration and construct an E820 that covers all the
right regions. Then use that for the guests and then you
can unplug/plug the PCI devices without much trouble.

That's possibly a step too far at this point.

That is where the e820_host=1 parameter can be used and
also some extra code to slurp up an XML of the E820 could be
implemented.

The 3GB HOLE could do it, but what if the host has some
odd layout where the HOLE is above 4GB? Then we are back at
remapping.

Such a host would also only work with devices that _only_
require 64-bit BARs. But they do exist (e.g. ATI GPUs).

Gordan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.