|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Wed, 2012-07-04 at 17:18 +0100, Dario Faggioli wrote:
> # HG changeset patch
> # User Dario Faggioli <raistlin@xxxxxxxx>
> # Date 1341416324 -7200
> # Node ID 7087d3622ee2051654c9e78fe4829da10c2d46f1
> # Parent 6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5
> libxl: enable automatic placement of guests on NUMA nodes
>
> If a domain does not have a VCPU affinity, try to pin it automatically to some
> PCPUs. This is done taking into account the NUMA characteristics of the host.
> In fact, we look for a combination of host's NUMA nodes with enough free
> memory
> and number of PCPUs for the new domain, and pin it to the VCPUs of those
> nodes.
>
> Once we know which ones, among all the possible combinations, represents valid
> placement candidates for a domain, use some heuistics for deciding which is
> the
> best. For instance, smaller candidates are considered to be better, both from
> the domain's point of view (fewer memory spreading among nodes) and from the
> system as a whole point of view (fewer memoy fragmentation). In case of
> candidates of equal sizes (i.e., with the same number of nodes), the amount of
> free memory and the number of domain already assigned to their nodes are
> considered. Very often, candidates with greater amount of memory are the one
> we wants, as this is also good for keeping memory fragmentation under control.
> However, if the difference in how much free memory two candidates have, the
> number of assigned domains might be what decides which candidate wins.
I can't parse this last sentence. Are there some words missing after
"how much free memory two candidates have"?
If you want to post the corrected text I think we can fold it in while
applying rather than reposting (assuming no other reason to repost crops
up).
> This all happens internally to libxl, and no API for driving the mechanism is
> provided for now. This matches what xend already does.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
>
> ---
> Changes from v2:
> * lots of typos.
> * Clayfied some comments, as requested during (ijc's) review.
> * Added some more information/reference for the combination generation
> algorithm.
> * nodemap_to_nodes_cpus() function renamed to nodemap_to_nr_cpus().
> * libxl_bitmap_init() used to make sure we do not try to free random
> memory on failure paths of functions that allocates a libxl_bitmap.
> * Always invoke libxl__sort_numa_candidates(), even if we get there
> with just 1 candidate, as requested during review.
> * Simplified the if-s that check for input parameter consistency in
> libxl__get_numa_candidates() as requested during (gwd's) review.
> * Comparison function for candidates changed so that it now provides
> total ordering, as requested during review. It is still using FP
> arithmetic, though. Also I think that just putting the difference
> between the amount of free memory and between the number of assigned
> domains of two candidates in a single formula (after normalizing and
> weighting them) is both clear and effective enough.
> * Function definitions moved to a numa specific source file (libxl_numa.c),
> as suggested during review.
>
>
> Changes from v1:
> * This patches incorporates the changes from both "libxl, xl: enable
> automatic
> placement of guests on NUMA nodes" and "libxl, xl: heuristics for
> reordering
> NUMA placement candidates" from v1.
> * The logic of the algorithm is basically the same as in v1, but the
> splitting
> of it in the various functions has been completely redesigned from scratch.
> * No public API for placement or candidate generation is now exposed,
> everything happens within libxl, as agreed during v1 review.
> * The relevant documentation have been moved near the actual functions and
> features. Also, the amount and (hopefully!) the quality of the
> documentation
> has been improved a lot, as requested.
> * All the comments about using the proper libxl facilities and helpers for
> allocations, etc., have been considered and applied.
> * This patch still bails out from NUMA optimizations if it find out cpupools
> are being utilized. It is next patch that makes the two things interact
> properly, as suggested during review.
>
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -111,8 +111,8 @@ created online and the remainder will be
>
> =item B<cpus="CPU-LIST">
>
> -List of which cpus the guest is allowed to use. Default behavior is
> -`all cpus`. A C<CPU-LIST> may be specified as follows:
> +List of which cpus the guest is allowed to use. By default xl will (via
> +libxl) pick some cpus (see below). A C<CPU-LIST> may be specified as follows:
>
> =over 4
>
> @@ -132,6 +132,12 @@ run on cpu #3 of the host.
>
> =back
>
> +If this option is not specified, libxl automatically tries to place the new
> +domain on the host's NUMA nodes (provided the host has more than one NUMA
> +node) by pinning it to the cpus of those nodes. A heuristic approach is
> +utilized with the goals of maximizing performance for the domain and, at
> +the same time, achieving efficient utilization of the host's CPUs and RAM.
> +
> =item B<cpu_weight=WEIGHT>
>
> A domain with a weight of 512 will get twice as much CPU as a domain
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -66,7 +66,7 @@ LIBXL_LIBS += -lyajl -lm
> LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
> libxl_dom.o libxl_exec.o libxl_xshelp.o
> libxl_device.o \
> libxl_internal.o libxl_utils.o libxl_uuid.o \
> - libxl_json.o libxl_aoutils.o \
> + libxl_json.o libxl_aoutils.o libxl_numa.o \
> libxl_save_callout.o _libxl_save_msgs_callout.o \
> libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
> LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -98,6 +98,106 @@ out:
> return sched;
> }
>
> +/* Subtract two values and translate the result in [0, 1] */
> +static double normalized_diff(double a, double b)
> +{
> +#define max(a, b) (a > b ? a : b)
> + if (!a && a == b)
> + return 0.0;
> + return (a - b) / max(a, b);
> +}
I think this actually returns a result in [-1,1] rather than [0,1] like
the comment says. Considering the a==1 b==2 case ->
( 1 - 2 ) / max(1, 2)
=> -1 / 2
=> -0.5
Is it the comment or the code which is wrong? (From the following
numa_cmpf I think the comment)
> +
> +/*
> + * The NUMA placement candidates are reordered according to the following
> + * heuristics:
> + * - candidates involving fewer nodes come first. In case two (or
> + * more) candidates span the same number of nodes,
> + * - the amount of free memory and the number of domains assigned to the
> + * candidates are considered. In doing that, candidates with greater
> + * amount of free memory and fewer domains assigned to them are preferred,
> + * with free memory "weighting" three times as much as number of domains.
> + */
> +static int numa_cmpf(const void *v1, const void *v2)
> +{
> + const libxl__numa_candidate *c1 = v1;
> + const libxl__numa_candidate *c2 = v2;
> +#define sign(a) a > 0 ? 1 : a < 0 ? -1 : 0
Does the caller of numa_cmpf rely on the result being specifically -1, 0
or +1? Usually such functions only care about -ve, 0 or +ve.
Or maybe this is a double->int conversion thing? I guess (int)0.1 == 0
rather than +ve like you'd want? Yes, I suppose that's it, nevermind. I
might have written the floating point zeroes as 0.0 to make it clearer
rather than relying on automatic promotion.
> + double freememkb_diff = normalized_diff(c2->free_memkb, c1->free_memkb);
> + double nrdomains_diff = normalized_diff(c1->nr_domains, c2->nr_domains);
> +
> + if (c1->nr_nodes != c2->nr_nodes)
> + return c1->nr_nodes - c2->nr_nodes;
> +
> + return sign(3*freememkb_diff + nrdomains_diff);
> +}
> +
> +/* The actual automatic NUMA placement routine */
> +static int numa_place_domain(libxl__gc *gc, libxl_domain_build_info *info)
I didn't review the rest -- George already acked it.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |