[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen x86 host memory limit issues

On 24/08/15 12:47, Jan Beulich wrote:
>>>> On 24.08.15 at 12:36, <andrew.cooper3@xxxxxxxxxx> wrote:
>> The infrastructure around xenheap_max_mfn() is supposed cause all
>> xenheap page allocations to fall within the Xen direct mapped region,
>> but experimentally doesn't work correctly.
>> In all cases I have seen, the bad xenheap allocations have been from
>> calls which contain numa information in the memflags, which leads me to
>> suspect it is an interaction issue of numa hinting information and
>> xenheap_bits.  At a guess I suspect alloc_heap_pages() doesn't correctly
>> override the numa hint when both a numa hint and zone limit are
>> provided, but I have not investigated this yet.
> But you're in the ideal position to do so. As said previously on the same
> topic, looking just at the code I can't see what's wrong, even when
> taking into account the experimentally observed behavior.

It is high on (but not top of) my todo list, as we currently have the
workaround in place.

From discussions at the Summit, I know that Orcale, Suse and Citrix all
have machines large enough to reproduce the issue.  This information is
provided as the request of Elena and Konrad (who it turns out I forgot
to CC on the original message.  Sorry!)

>> Fixing that bug will be a useful step, as it will allow Xen to function
>> with host ram above the direct map limit, but is still not an optimal
>> solution as it prevents getting numa-local xenheap memory.
>> Longterm it would be optimal to segment the direct map region by numa
>> node so there is equal quantities of xenheap memory available from each
>> numa node.
> Yes, albeit I'm suspecting there to arise (at least theoretical) issues
> on systems with many nodes - the per-node ranges directly mapped
> may become unreasonably small (and we may risk exhausting node
> 0's memory due to not NUMA-tagged allocation requests).

There are a number of allocation constraints.  Off the top of my head:

* DMA pools for dom0 (mitigated in certain circumstances by PVIOMMU)
* <128GB for 32bit PV domheap pages
* <4GB for some 32bit PV L3 pages

Some of this can be avoided by allocating directmap from the upper ram
in the numa nodes.  Exhaustion of node 0 can be mitigated by striping
allocations without a numa hint, or allocating from the node with most
free space remaining.

There should actually be very few allocations which can't have a numa
hint provided.  All allocations for anything hardware related should be
on local node, and everything else should be allocations on behalf a
domain, which itself has numa information.

As an orthogonal task, we should see whether it is possible to nab any
virtual address space back from 64bit PV guests, or whether it is
irreparably fixed at its current value.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.