Re: [Xen-devel] [Hackathon minutes] PV frontends/backends and NUMA machines

On mar, 2013-05-21 at 10:20 +0100, Tim Deegan wrote:
> At 09:47 +0100 on 21 May (1369129629), George Dunlap wrote:
> > The second is to make the pfn -> NUMA node layout reasonable.  At the
> > moment, as I understand it, pfns will be striped across nodes.  In
> > theory dom0 could deal with this, but it seems like in practice it's
> > going to be nasty trying to sort that stuff out.  It would be much
> > better, if you have (say) 4 nodes and 4GiB of memory assigned to dom0,
> > to have pfn 0-1G on node 0, 1-2G on node 2, &c.
> Yeah, I can see that fixing that post-hoc would be a PITA.  
Indeed! :-P

> I guess if
> you figure out the vcpu assignments at dom0-build time, the normal NUMA
> memory allocation code will just DTRT (since that's what you'd want for
> a comparable domU)?
Well, we need to check what actually happens deeper, but I don't think
that is going to be enough, and that is true for DomUs as well.

In fact, what we have right now (for DomUs) is: memory is allocated from
a subset of the host nodes and vCPUs pefers to be scheduled on the pCPUs
of those nodes. However, what a true 'guest NUMA awareness' prescribes
is that you know what memory "belongs to" (i.e., is accessed quicker
from) each vCPU, and this is something we don't have.

So, yes, I think creating a node-affinity for Dom0 early enough would be
a reasonable first step, and would already help quite a bit. However,
that would just mean that we'll (going back to George's example) get 1G
of memory from node0, 1G of memory from node1, etc. What we want is to
force pfns 0-1G to be actually allocated out of node0, and so on and so
forth... And that is something that I don't think the current code can


