Xen project Mailing List

Re: [Xen-devel] Xen x86 host memory limit issues

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Mon, 24 Aug 2015 13:31:47 +0100

Cc: Elena Ufimtseva <elena.ufimtseva@xxxxxxxxxx>, Juergen Gross <JGross@xxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Xen-devel List <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Mon, 24 Aug 2015 12:31:57 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 24/08/15 12:47, Jan Beulich wrote: >>>> On 24.08.15 at 12:36, <andrew.cooper3@xxxxxxxxxx> wrote: >> The infrastructure around xenheap_max_mfn() is supposed cause all >> xenheap page allocations to fall within the Xen direct mapped region, >> but experimentally doesn't work correctly. >> >> In all cases I have seen, the bad xenheap allocations have been from >> calls which contain numa information in the memflags, which leads me to >> suspect it is an interaction issue of numa hinting information and >> xenheap_bits. At a guess I suspect alloc_heap_pages() doesn't correctly >> override the numa hint when both a numa hint and zone limit are >> provided, but I have not investigated this yet. > But you're in the ideal position to do so. As said previously on the same > topic, looking just at the code I can't see what's wrong, even when > taking into account the experimentally observed behavior. It is high on (but not top of) my todo list, as we currently have the workaround in place. From discussions at the Summit, I know that Orcale, Suse and Citrix all have machines large enough to reproduce the issue. This information is provided as the request of Elena and Konrad (who it turns out I forgot to CC on the original message. Sorry!) > >> Fixing that bug will be a useful step, as it will allow Xen to function >> with host ram above the direct map limit, but is still not an optimal >> solution as it prevents getting numa-local xenheap memory. >> >> Longterm it would be optimal to segment the direct map region by numa >> node so there is equal quantities of xenheap memory available from each >> numa node. > Yes, albeit I'm suspecting there to arise (at least theoretical) issues > on systems with many nodes - the per-node ranges directly mapped > may become unreasonably small (and we may risk exhausting node > 0's memory due to not NUMA-tagged allocation requests). There are a number of allocation constraints. Off the top of my head: * DMA pools for dom0 (mitigated in certain circumstances by PVIOMMU) * <128GB for 32bit PV domheap pages * <4GB for some 32bit PV L3 pages Some of this can be avoided by allocating directmap from the upper ram in the numa nodes. Exhaustion of node 0 can be mitigated by striping allocations without a numa hint, or allocating from the node with most free space remaining. There should actually be very few allocations which can't have a numa hint provided. All allocations for anything hardware related should be on local node, and everything else should be allocations on behalf a domain, which itself has numa information. As an orthogonal task, we should see whether it is possible to nab any virtual address space back from 64bit PV guests, or whether it is irreparably fixed at its current value. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.