[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes



On Wed, 2012-07-04 at 18:18 +0200, Dario Faggioli wrote:
> # HG changeset patch
> # User Dario Faggioli <raistlin@xxxxxxxx>
> # Date 1341416324 -7200
> # Node ID 7087d3622ee2051654c9e78fe4829da10c2d46f1
> # Parent  6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5
> libxl: enable automatic placement of guests on NUMA nodes
> 
George, Ian (Jackson),

> Once we know which ones, among all the possible combinations, represents valid
> placement candidates for a domain, use some heuistics for deciding which is 
> the
> best. For instance, smaller candidates are considered to be better, both from
> the domain's point of view (fewer memory spreading among nodes) and from the
> system as a whole point of view (fewer memoy fragmentation).  In case of
> candidates of equal sizes (i.e., with the same number of nodes), the amount of
> free memory and the number of domain already assigned to their nodes are
> considered. Very often, candidates with greater amount of memory are the one
> we wants, as this is also good for keeping memory fragmentation under control.
> However, if the difference in how much free memory two candidates have, the
> number of assigned domains might be what decides which candidate wins.
>
> [...]
>
> ---
> Changes from v2:
>
> [...]
>
>  * Comparison function for candidates changed so that it now provides
>    total ordering, as requested during review. It is still using FP
>    arithmetic, though. Also I think that just putting the difference
>    between the amount of free memory and between the number of assigned
>    domains of two candidates in a single formula (after normalizing and
>    weighting them) is both clear and effective enough.
>
> [...]
>  
I thought at what a sensible comparison function should look like, and I
also plotted some graphs while randomly generating both the amount of
free memory and number of domains. The outcome of all this is I managed
in convincing myself the solution below is both clear and understandable
as it is effective (confirmed by the test I was able to perform up to
now).

Basically, the idea is to consider both things (freemem and nr_domains),
but with freemem being 3 times more "important" than nr_domains. That is
very similar to one of the log()-based solutions proposed by Ian, but I
really think just normalizing and weighting is easier to understand,
even if quickly looking at the formula/code.

Regarding the percent penalty per each domain proposed by George on IRC,
I liked the idea a lot, but the figured out that here will always be a
number of domains (e.g., 20, if the penalty is set to 5%) starting from
which nr_domains counts more than freemem, which is the opposite of what
I want to achieve.

So, now, comparison between placement candidates happens like this:

  return a.nr_nodes < b.nr_nodes ? a :
             b.nr_nodes < a.nr_nodes ? b :
             3*norm_diff(b.freemem, a.freemem) - norm_diff(a.nr_domains, 
b.nr_domains)

 where:

  norm_diff(x, y) := (x - y)/max(x, y)

Which removes the nasty effects of having that 10% range as in v2 of the
series. If that '3' looks too much of a magic number, I can of course
enum/#define it, or even make it configurable (although, the latter, not
for 4.2, I guess :-) ).

> +/* Subtract two values and translate the result in [0, 1] */
> +static double normalized_diff(double a, double b)
> +{
> +#define max(a, b) (a > b ? a : b)
> +    if (!a && a == b)
> +        return 0.0;
> +    return (a - b) / max(a, b);
> +}
> +
> +/*
> + * The NUMA placement candidates are reordered according to the following
> + * heuristics:
> + *  - candidates involving fewer nodes come first. In case two (or
> + *    more) candidates span the same number of nodes,
> + *  - the amount of free memory and the number of domains assigned to the
> + *    candidates are considered. In doing that, candidates with greater
> + *    amount of free memory and fewer domains assigned to them are preferred,
> + *    with free memory "weighting" three times as much as number of domains.
> + */
> +static int numa_cmpf(const void *v1, const void *v2)
> +{
> +    const libxl__numa_candidate *c1 = v1;
> +    const libxl__numa_candidate *c2 = v2;
> +#define sign(a) a > 0 ? 1 : a < 0 ? -1 : 0
> +    double freememkb_diff = normalized_diff(c2->free_memkb, c1->free_memkb);
> +    double nrdomains_diff = normalized_diff(c1->nr_domains, c2->nr_domains);
> +
> +    if (c1->nr_nodes != c2->nr_nodes)
> +        return c1->nr_nodes - c2->nr_nodes;
> +
> +    return sign(3*freememkb_diff + nrdomains_diff);
> +}
> +

What do you think?

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.