WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [PATCH] 0/7 xen: Add basic NUMA support

To: "Ryan Harper" <ryanh@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [PATCH] 0/7 xen: Add basic NUMA support
From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date: Sat, 17 Dec 2005 01:28:01 -0000
Cc: Ryan Grimm <grimm@xxxxxxxxxx>
Delivery-date: Sat, 17 Dec 2005 01:30:18 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcYClM4getVFYoK2TW6hazyu3g3RxgAEqToA
Thread-topic: [Xen-devel] [PATCH] 0/7 xen: Add basic NUMA support
 
> The patchset will add basic NUMA support to Xen (hypervisor 
> only).  

I think we need a lot more discussion on this -- your approach differs
from what we've previously discussed on the list. We need a session at
the Jan summit.

> We borrowed from Linux support for NUMA SRAT table 
> parsing, discontiguous memory tracking (mem chunks), and cpu 
> support (node_to_cpumask etc).
>
> The hypervisor parses the SRAT tables and constructs mappings 
> for each node such as node to cpu mappings and memory range 
> to node mappings.

Having the code for parsing the SRAT table is clearly a good thing.

> Using this information, we also modified the page allocator 
> to provide a simple NUMA-aware API.  The modified allocator 
> will attempt to find pages local to the cpu where possible, 
> but will fall back on using memory that is of the requested 
> size rather than fragmenting larger contiguous chunks to find 
> local pages.  We expect to tune this algorithm in the future 
> after further study.

Personally, I think we should have separate budy allocators for each of
the zones; much simpler and faster in the common case.
 
> We also modified Xen's increase_reservation memory op to 
> balance memory distribution across the vcpus in use by a 
> domain.  Relying on previous patches which have already been 
> committed to xen-unstable, a guest can be constructed such 
> that its entire memory is contained within a specific NUMA node.

This makes sense for 1 vcpu guests, but for multi vcpu guests this needs
way more discussion. How do we expose the (potentially dynamic) mapping
of vcpus to nodes? How do we expose the different memory zones to
guests? How does Linux make use of this information? This is a can of
worms, definitely phase 2. 

> We've added a keyhandler for exposing some of the 
> NUMA-related information and statistics that pertain to the 
> hypervisor.
> 
> We export NUMA system information via the physinfo hypercall. 
>  This information provides cpu/memory topology and 
> configuration information gleaned from the SRAT tables to 
> userspace applications.  Currently, xend doesn't leverage any 
> of the information automatically but we intend to do so in the future.

Yep, useful.

> We've integrated in NUMA information into xentrace so we can 
> track various points such as page allocator hits and misses 
> as well as other information.  In the process of implementing 
> the trace, we also fixed some incorrect assumptions about the 
> symmetry of NUMA systems w.r.t the sockets_per_node value.  
> Details are available a later email with the patch.

Nice.

> These patches have been tested on several IBM NUMA and 
> non-NUMA systems:
> 
> NUMA-aware systems: 
> IBM Dual Opteron:  2 Node,  2 CPU,  4GB 
> IBM x445        :  4 Node, 32 CPU, 32GB 
> IBM x460        :  1 Node,  8 CPU, 16GB
> IBM x460        :  2 Node, 32 CPU, 32GB


If only we had an x445 to be able to work on these patches :)

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>