|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC v2 4/7] libxl/vNUMA: vNUMA libxl functionality.
On Fri, Sep 13, 2013 at 9:50 AM, Elena Ufimtseva <ufimtseva@xxxxxxxxx> wrote:
> vNUMA libxl supporting functionality.
> libxl supporting functionality for vNUMA includes:
> * having vNUMA memory areas sizes, transforms it to
> start and end pfns based on domain e820 map;
> * contructs vnode_to_pnode map for vNUMA nodes memory
> allocation and pass it to Xen; the mechanism considers
> automatic NUMA placement in case of presence of hardware
> NUMA; In best case scenario all vnodes will be allocated
> within one pnode. If the domain spans different pnodes,
> the vnodes will be one-by-one placed to pnodes. If such
> allocation is impossible due to the memory constraints,
> the allocation will use default mechanism; this is worst
> case scenario.
Why would someone want to make a VM with two vnodes and then put them
on the same pnode? Apart from testing, of course, but our defaults
should be for the common case of real users.
[snip]
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index a78c91d..f7b1aeb 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -308,3 +308,89 @@ int libxl__arch_domain_create(libxl__gc *gc,
> libxl_domain_config *d_config,
>
> return ret;
> }
> +
> +unsigned long e820_memory_hole_size(unsigned long start, unsigned long end,
> struct e820entry e820[], int nr)
> +{
> +#define clamp(val, min, max) ({ \
> + typeof(val) __val = (val); \
> + typeof(min) __min = (min); \
> + typeof(max) __max = (max); \
> + (void) (&__val == &__min); \
> + (void) (&__val == &__max); \
> + __val = __val < __min ? __min: __val; \
> + __val > __max ? __max: __val; })
What on earth is going on here?
Are these comparison of pointers to work around some gdb "unused
variable" warnings or something?
This would be much better as a static inline function that explicitly
returns a value.
> + int i;
> + unsigned long absent, start_pfn, end_pfn;
> + absent = start - end;
> + for(i = 0; i < nr; i++) {
> + if(e820[i].type == E820_RAM) {
> + start_pfn = clamp(e820[i].addr, start, end);
> + end_pfn = clamp(e820[i].addr + e820[i].size, start, end);
> + absent -= end_pfn - start_pfn;
> + }
> + }
> + return absent;
> +}
> +
> +/*
> + * Aligns memory blocks in domain info for linux NUMA build image.
> + * Takes libxl_domain_build_info memory sizes and returns aligned to
> + * domain e820 map linux numa blocks in guest pfns.
> + */
> +int libxl__vnuma_align_mem(libxl__gc *gc,
> + uint32_t domid,
> + libxl_domain_build_info *b_info, /* IN: mem
> sizes */
> + vnuma_memblk_t *memblks) /* OUT: linux
> numa blocks in pfn */
> +{
> +#ifndef roundup
> +#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
> +#endif
> + int i, rc;
> + unsigned long shift = 0, size, node_min_size = 1, limit;
> + unsigned long end_max;
> + uint32_t nr;
> + struct e820entry map[E820MAX];
> +
> + libxl_ctx *ctx = libxl__gc_owner(gc);
> + rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
> + if (rc < 0) {
> + errno = rc;
> + return -EINVAL;
> + }
> + nr = rc;
> + rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
> + (b_info->max_memkb - b_info->target_memkb) +
> + b_info->u.pv.slack_memkb);
> + if (rc)
> + return ERROR_FAIL;
> + end_max = map[nr-1].addr + map[nr-1].size;
> + shift = 0;
> + memset(memblks, 0, sizeof(*memblks)*b_info->nr_vnodes);
> + memblks[0].start = map[0].addr;
> + for(i = 0; i < b_info->nr_vnodes; i++) {
> + memblks[i].start += shift;
OK, at this point memblks[i] should be 0, since we zeroed it up
there... why are you += instead of just =?
And the code below is confusing and I think isn't doing what you think
it does. Walk me through this.
> + memblks[i].end += shift + b_info->vnuma_memszs[i];
.end started at 0, so this is effectively an obfuscated '='.
So suppose vnuma_memszs is 256MiB; so now .end == .start + 256MiB.
> + limit = size = memblks[i].end - memblks[i].start;
So now limit and size are both 256MiB.
> + while (memblks[i].end - memblks[i].start -
> + e820_memory_hole_size(memblks[i].start, memblks[i].end, map,
> nr)
> + < size) {
And suppose there was a hole that overlapped here, so we enter this loop.
> + memblks[i].end += node_min_size;
> + shift += node_min_size;
> + if (memblks[i].end - memblks[i].start >= limit) {
Now, at this point, unless node_min_size is negative, this is *always*
going to be true.
> + memblks[i].end = memblks[i].start + limit;
> + break;
So we set .end back to what it was before (.start + 256MiB) and break
out of the loop?
> + }
> + if (memblks[i].end == end_max) {
> + memblks[i].end = end_max;
> + break;
> + }
> + }
> + shift = memblks[i].end;
And throw away whatever we did to shift before?
What was it you were actually trying to do here?
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |