[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/pod: Do not fragment PoD memory allocations



On 25.01.2021 18:46, Elliott Mitchell wrote:
> On Mon, Jan 25, 2021 at 10:56:25AM +0100, Jan Beulich wrote:
>> On 24.01.2021 05:47, Elliott Mitchell wrote:
>>>
>>> ---
>>> Changes in v2:
>>> - Include the obvious removal of the goto target.  Always realize you're
>>>   at the wrong place when you press "send".
>>
>> Please could you also label the submission then accordingly? I
>> got puzzled by two identically titled messages side by side,
>> until I noticed the difference.
> 
> Sorry about that.  Would you have preferred a third message mentioning
> this mistake?

No. But I'd have expected v2 to have its subject start with
"[PATCH v2] ...", making it relatively clear that one might
save looking at the one labeled just "[PATCH] ...".

>>> I'm not including a separate cover message since this is a single hunk.
>>> This really needs some checking in `xl`.  If one has a domain which
>>> sometimes gets started on different hosts and is sometimes modified with
>>> slightly differing settings, one can run into trouble.
>>>
>>> In this case most of the time the particular domain is most often used
>>> PV/PVH, but every so often is used as a template for HVM.  Starting it
>>> HVM will trigger PoD mode.  If it is started on a machine with less
>>> memory than others, PoD may well exhaust all memory and then trigger a
>>> panic.
>>>
>>> `xl` should likely fail HVM domain creation when the maximum memory
>>> exceeds available memory (never mind total memory).
>>
>> I don't think so, no - it's the purpose of PoD to allow starting
>> a guest despite there not being enough memory available to
>> satisfy its "max", as such guests are expected to balloon down
>> immediately, rather than triggering an oom condition.
> 
> Even Qemu/OVMF is expected to handle ballooning for a *HVM* domain?

No idea how qemu comes into play here. Any preboot environment
aware of possibly running under Xen of course is expected to
tolerate running with maxmem > memory (or clearly document its
inability, in which case it may not be suitable for certain
use cases). For example, I don't see why a preboot environment
would need to touch all of the memory in a VM, except maybe
for the purpose of zeroing it (which PoD can deal with fine).

>>> For example try a domain with the following settings:
>>>
>>> memory = 8192
>>> maxmem = 2147483648
>>>
>>> If type is PV or PVH, it will likely boot successfully.  Change type to
>>> HVM and unless your hardware budget is impressive, Xen will soon panic.
>>
>> Xen will panic? That would need fixing if so. Also I'd consider
>> an excessively high maxmem (compared to memory) a configuration
>> error. According to my experiments long, long ago I seem to
>> recall that a factor beyond 32 is almost never going to lead to
>> anything good, irrespective of guest type. (But as said, badness
>> here should be restricted to the guest; Xen itself should limp
>> on fine.)
> 
> I'll confess I haven't confirmed the panic is in Xen itself.  Problem is
> when this gets triggered, by the time the situation is clear and I can
> get to the console the computer is already restarting, thus no error
> message has been observed.

If the panic isn't in Xen itself, why would the computer be
restarting?

> This is most certainly a configuration error.  Problem is this is a very
> small delta between a perfectly valid configuration and the one which
> reliably triggers a panic.
> 
> The memory:maxmem ratio isn't the problem.  My example had a maxmem of
> 2147483648 since that is enough to exceed the memory of sub-$100K
> computers.  The crucial features are maxmem >= machine memory,
> memory < free memory (thus potentially bootable PV/PVH) and type = "hvm".
> 
> When was the last time you tried running a Xen machine with near zero
> free memory?  Perhaps in the past Xen kept the promise of never panicing
> on memory exhaustion, but this feels like this hasn't held for some time.

Such bugs needs fixing. Which first of all requires properly
pointing them out. A PoD guest misbehaving when there's not
enough memory to fill its pages (i.e. before it manages to
balloon down) is expected behavior. If you can't guarantee the
guest ballooning down quickly enough, don't configure it to
use PoD. A PoD guest causing a Xen crash, otoh, is very likely
even a security issue. Which needs to be treated as such: It
needs fixing, not avoiding by "curing" one of perhaps many
possible sources.

As an aside - if the PoD code had proper 1Gb page support,
would you then propose to only allocate in 1Gb chunks? And if
there was a 512Gb page feature in hardware, in 512Gb chunks
(leaving aside the fact that scanning 512Gb of memory to be
all zero would simply take too long with today's computers)?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.