Xen project Mailing List

Re: [Xen-devel] [PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes

From: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Date: Tue, 17 Jul 2012 16:55:50 +0100

Cc: Andre Przywara <andre.przywara@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 17 Jul 2012 15:56:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Dario Faggioli writes ("[PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes"): > If a domain does not have a VCPU affinity, try to pin it automatically to some > PCPUs. This is done taking into account the NUMA characteristics of the host. > In fact, we look for a combination of host's NUMA nodes with enough free > memory > and number of PCPUs for the new domain, and pin it to the VCPUs of those > nodes. Thanks for this admirably clear patch. Can you please rewrap your commit messages to around 70 lines ? Many VCSs indent them in the log in some situations, and as you see here mail programs indent them when quoting too. Indeed I would generally prefer to wrap code to ~75 rather than 80 but I know that's a bit controversial. > +/* Subtract two values and translate the result in [0, 1] */ > +static double normalized_diff(double a, double b) > +{ > +#define max(a, b) (a > b ? a : b) > + if (!a && a == b) > + return 0.0; > + return (a - b) / max(a, b); > +} 1. This macro max() should be in libxl_internal.h. 2. It should be MAX so people are warned it's a macro 3. It should have all the necessary ()s for macro precedence safety > +static int numa_cmpf(const void *v1, const void *v2) > +{ > + const libxl__numa_candidate *c1 = v1; > + const libxl__numa_candidate *c2 = v2; > +#define sign(a) a > 0 ? 1 : a < 0 ? -1 : 0 Again. > + double freememkb_diff = normalized_diff(c2->free_memkb, c1->free_memkb); > + double nrdomains_diff = normalized_diff(c1->nr_domains, c2->nr_domains); > + > + if (c1->nr_nodes != c2->nr_nodes) > + return c1->nr_nodes - c2->nr_nodes; > + > + return sign(3*freememkb_diff + nrdomains_diff); The reason you need what sign() does is that you need to convert from double to int, I guess. > + > + /* > + * Check if the domain has any CPU affinity. If not, try to build up one. > + * In case numa_place_domain() find at least a suitable candidate, it > will > + * affect info->cpumap accordingly; if it does not, it just leaves it > + * as it is. This means (unless some weird error manifests) the > subsequent > + * call to libxl_set_vcpuaffinity_all() will do the actual placement, > + * whatever that turns out to be. > + */ > + if (libxl_bitmap_is_full(&info->cpumap)) { > + int rc = numa_place_domain(gc, info); > + if (rc) > + return rc; > + } I guess it would be preferable to do this only if the bitmap was full by default, so that setting the bitmap explicitly to all cpus still works. I'm not sure that that's essential to have, though. > + /* > + * Conditions are met, we can add this combination to the > + * NUMA placement candidates list. We first make sure there > + * is enough space in there, and then we initialize the new > + * candidate element with the node map corresponding to the > + * combination we are dealing with. Memory allocation for > + * expanding the array that hosts the list happens in chunks > + * equal to the number of NUMA nodes in the system (to > + * avoid allocating memory each and every time we find a > + * new candidate). > + */ > + if (*nr_cndts == array_size) > + array_size += nr_nodes; > + GCREALLOC_ARRAY(new_cndts, array_size); This part of the algorithm is quadratic in the number of combinations divided by the number of nodes. So the algorithm is O( ( C( nr_nodes, min_nodes ) / min_nodes )^2 ) which is quite bad really. At the very least this needs to be an exponential allocation, eg + array_size += nr_nodes + array_size; But if you didn't insist on creating the whole list and sorting it, you would avoid this allocation entirely, wouldn't you ? Should we bve worried that this algorithm will be too slow even if it involves just O( C(nr_nodes,min_nodes) ) iterations ? Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.