Xen project Mailing List

[Xen-devel] [PATCH 3 of 3 v5/leftover] About rationale, usage and (some small bits of) API

To: xen-devel <xen-devel@xxxxxxxxxxxxx>, Dario Faggioli <raistlin@xxxxxxxx>

From: Dario Faggioli <raistlin@xxxxxxxx>

Date: Mon, 16 Jul 2012 19:05:41 +0200

Cc: Andre Przywara <andre.przywara@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Delivery-date: Mon, 16 Jul 2012 17:08:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> Acked-by: Ian Campbell <ian.campbell@xxxxxxxxxx> --- Changes from v3: * typos and rewording of some sentences, as suggested during review. Changes from v1: * API documentation moved close to the actual functions. diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown new file mode 100644 --- /dev/null +++ b/docs/misc/xl-numa-placement.markdown @@ -0,0 +1,89 @@ +# Guest Automatic NUMA Placement in libxl and xl # + +## Rationale ## + +NUMA means the memory accessing times of a program running on a CPU depends on +the relative distance between that CPU and that memory. In fact, most of the +NUMA systems are built in such a way that each processor has its local memory, +on which it can operate very fast. On the other hand, getting and storing data +from and on remote memory (that is, memory local to some other processor) is +quite more complex and slow. On these machines, a NUMA node is usually defined +as a set of processor cores (typically a physical CPU package) and the memory +directly attached to the set of cores. + +The Xen hypervisor deals with Non-Uniform Memory Access (NUMA]) machines by +assigning to its domain a "node affinity", i.e., a set of NUMA nodes of the +host from which it gets its memory allocated. + +NUMA awareness becomes very important as soon as many domains start running +memory-intensive workloads on a shared host. In fact, the cost of accessing non +node-local memory locations is very high, and the performance degradation is +likely to be noticeable. + +## Guest Placement in xl ## + +If using xl for creating and managing guests, it is very easy to ask for both +manual or automatic placement of them across the host's NUMA nodes. + +Note that xm/xend does the very same thing, the only differences residing in +the details of the heuristics adopted for the placement (see below). + +### Manual Guest Placement with xl ### + +Thanks to the "cpus=" option, it is possible to specify where a domain should +be created and scheduled on, directly in its config file. This affects NUMA +placement and memory accesses as the hypervisor constructs the node affinity of +a VM basing right on its CPU affinity when it is created. + +This is very simple and effective, but requires the user/system administrator +to explicitly specify affinities for each and every domain, or Xen won't be +able to guarantee the locality for their memory accesses. + +It is also possible to deal with NUMA by partitioning the system using cpupools +(available in the upcoming release of Xen, 4.2). Again, this could be "The +Right Answer" for many needs and occasions, but has to to be carefully +considered and manually setup by hand. + +### Automatic Guest Placement with xl ### + +If no "cpus=" option is specified in the config file, libxl tries to figure out +on its own on which node(s) the domain could fit best. It is worthwhile noting +that optimally fitting a set of VMs on the NUMA nodes of an host is an +incarnation of the Bin Packing Problem. In fact, the various VMs with different +memory sizes are the items to be packed, and the host nodes are the bins. That +is known to be NP-hard, thus. We will therefore be using some heuristics. + +The first thing to do is finding a node, or even a set of nodes, that have +enough free memory and enough physical CPUs for accommodating the one new +domain. The idea is to find a spot for the domain with at least as much free +memory as it has configured, and as much pCPUs as it has vCPUs. After that, +the actual decision on which solution to go for happens accordingly to the +following heuristics: + + * candidates involving fewer nodes come first. In case two (or more) + candidates span the same number of nodes, + * the amount of free memory and the number of domains assigned to the + candidates are considered. In doing that, candidates with greater amount + of free memory and fewer assigned domains are preferred, with free memory + "weighting" three times as much as number of domains. + +Giving preference to candidates with fewer nodes ensures better performance for +the guest, as it avoid spreading its memory among different nodes. Favouring +the nodes that have the largest amounts of free memory helps keeping the memory +fragmentation small, from a system wide perspective. However, if more +candidates fulfil these criteria by roughly the same extent, having the number +of domains the candidates are "hosting" helps balancing the load on the various +nodes. + +## Guest Placement within libxl ## + +xl achieves automatic NUMA just because libxl does it interrnally. +No API is provided (yet) for interacting with this feature and modify +the library behaviour regarding automatic placement, it just happens +by default if no affinity is specified (as it is with xm/xend). + +For actually looking and maybe tweaking the mechanism and the algorithms it +uses, all is implemented as a set of libxl internal interfaces and facilities. +Look at the comment "Automatic NUMA placement" in libxl\_internal.h. + +Note this may change in future versions of Xen/libxl. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.