[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Hackathon minutes] PV frontends/backends and NUMA machines

On 05/21/2013 10:20 AM, Tim Deegan wrote:
At 09:47 +0100 on 21 May (1369129629), George Dunlap wrote:
On Tue, May 21, 2013 at 9:32 AM, Tim Deegan <tim@xxxxxxx> wrote:
At 14:48 +0100 on 20 May (1369061330), George Dunlap wrote:
So the work items I remember are as follows:
1. Implement NUMA affinity for vcpus
2. Implement Guest NUMA support for PV guests
3. Teach Xen how to make a sensible NUMA allocation layout for dom0

Does Xen need to do this?  Or could dom0 sort that out for itself after

There are two aspects of this.  First would be, if dom0.nvcpus <
host.npcpus, to place the vcpus reasonably on the various numa nodes.

Well, that part at least seems like it can be managed quite nicely from
dom0 userspace, in a Xen init script.  But...

The second is to make the pfn -> NUMA node layout reasonable.  At the
moment, as I understand it, pfns will be striped across nodes.  In
theory dom0 could deal with this, but it seems like in practice it's
going to be nasty trying to sort that stuff out.  It would be much
better, if you have (say) 4 nodes and 4GiB of memory assigned to dom0,
to have pfn 0-1G on node 0, 1-2G on node 2, &c.

Yeah, I can see that fixing that post-hoc would be a PITA.  I guess if
you figure out the vcpu assignments at dom0-build time, the normal NUMA
memory allocation code will just DTRT (since that's what you'd want for
a comparable domU)?

I'm not sure why you think so -- for one, please correct me if I'm wrong, but NUMA affinity is a domain construct, not a vcpu construct. Memory is allocated on behalf of a domain, not a vcpu, and is allocated a batch at a time. So how is the memory allocator supposed to know that the current allocation request is in the middle of the second gigabyte of a 4G total, and thus to allocate from node 1?

What we would want for a comparable domU -- a domU that was NUMA-aware -- was to have the pfn layout in batches across the nodes to which it will be pinned. E.g., if a domU has its NUMA affinity set to nodes 2-3, then you'd want the first half of the pfns to come from node 2, the second half from node 3.

In both cases, the domain builder will need to call the allocator with specific numa nodes for specific regions of the PFN space.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.