This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH 0/6] xen,xend,tools: NUMA support for Xen

To: Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 0/6] xen,xend,tools: NUMA support for Xen
From: Ryan Harper <ryanh@xxxxxxxxxx>
Date: Wed, 12 Jul 2006 15:30:31 -0500
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 12 Jul 2006 13:31:04 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <20060712012313.GO1694@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <A95E2296287EAD4EB592B5DEEFCE0E9D57206B@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <20060712012313.GO1694@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.6+20040907i
* Ryan Harper <ryanh@xxxxxxxxxx> [2006-07-11 20:24]:
> * Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx> [2006-07-11 16:28]:
> > It does kind of surprise me that the overhead is as high as you've
> > measured. In the case where there's memory available in the favoured
> > node I'd expect allocation performance to be very similar. 4 times
> > slower and worsening for large allocations seems odd -- 0.3 microseconds
> > a page is a bit more than I'd expect during back-to-back allocations.
> > It's certainly worth trying to understand the overhead a bit more.
> I agree.  I'm a little mystified by the overhead as well.  On the larger
> system, ballooning up to 23G had something like 11% overhead, which was
> more reasonable, though the domain creation tests showed more than 11%
> on that system as well.  I'll get the oprofile data and take a look.

Using oprofile I have some optimizations as well as some further
understanding of the behavior.  I've removed as much logic from the fast
path as possible:

1. Dropped some superfluous calls to num_online_nodes(); that doesn't
2. Only calculate the next node when the current node's memory has been
3. Don't bother distributing pages evenly across vcpus in all cases.

I've a couple thoughts here: 

1) only distribute if the domains' processors are spanning more than one
node. This requires additional code to track on which nodes the domain
is running.  Fairly trivial for sedf to update the domain nodemask
during vcpu affinity ops, a bit more hairy for credit scheduler. 

2) We can take the easy path and just use vcpu0's processor as a chooser
of which node to pull memory from.  We currently tune our domU configs
based on topology info exported anyhow to keep the domU within the node.

4. Don't bother looking for pages in an empty node. ie, check if target
zone/node can support the request.

These changes brought all the allocations down; matching without-NUMA up
to 512M allocations, though there is still overhead.  Looking at the
oprofile data I collected over several runs, I'm not seeing anything
else sticking out.  Also we were seeing worse times for larger
allocations because we were exhausting memory from one node and and
pulling from a second which resulted in a large number of lookups in the
empty node.  Optimization (4) addresses that issue.

Here is the new data.  I can attach some of the oprofile data if anyone
is interested.  I'll update patches 2 and 3 if the current overhead
is acceptable.

Balloon up:
Try1: 911ms
Try2: 907ms
Try3: 910ms

With NUMA+optimizations
Try1: 709ms
Try2: 701ms
Try3: 703ms

Without NUMA:
Try1: 606ms
Try2: 604ms
Try3: 608ms

Increase reservation
With NUMA:
MemSize  128M 512M 1G   2G    3G
Try1:    6ms  26ms 53ms 221ms 390ms
Try2:    6ms  26ms 48ms 212ms 390ms
Try3:    6ms  26ms 48ms 212ms 390ms

With NUMA + optimizations
MemSize  128M 512M 1G   2G   3G
Try1:    4ms  15ms 30ms 80ms 150ms
Try2:    3ms  14ms 30ms 80ms 159ms
Try3:    4ms  15ms 33ms 80ms 159ms

Without NUMA:
MemSize  128M 512M 1G   2G   3G
Try1:    4ms  16ms 25ms 70ms 100ms
Try2:    3ms  14ms 28ms 56ms 109ms
Try3:    3ms  14ms 23ms 56ms  95ms

Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253

Xen-devel mailing list