[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RESEND 05/12] xen: numa-sched: make space for per-vcpu node-affinity



On 11/05/2013 04:56 PM, George Dunlap wrote:
On 11/05/2013 03:39 PM, George Dunlap wrote:
On 11/05/2013 03:23 PM, Jan Beulich wrote:
On 05.11.13 at 16:11, George Dunlap <george.dunlap@xxxxxxxxxxxxx>
wrote:
Or, we could internally change the names to "cpu_hard_affinity" and
"cpu_soft_affinity", since that's effectively what the scheduler will
do.  It's possible someone might want to set soft affinities for some
other reason besides NUMA performance.

I like that.

A potential problem with that is the "auto domain numa" thing.  In this
patch, if the domain numa affinity is not set but vcpu numa affinity is,
the domain numa affinity (which will be used to allocate memory for the
domain) will be set based on the vcpu numa affinity.  That seems like a
useful feature (though perhaps it's starting to violate the "policy
should be in the tools" principle).  If we change this to just "hard
affinity" and "soft affinity", we'll lose the natural logical connection
there.  It might have impacts on how we end up doing vNUMA as well.  So
I'm a bit torn ATM.

Dario, any thoughts?

[Coming back after going through the whole series]

This is basically the main architectural question that needs to be
sorted out with the series: Do we bake in that the "soft affinity" is
specifically for NUMA-ness, or not?

The patch the way it is does make this connection, and that has several
implications:
* There is no more concept of a separate "domain numa affinity" (Patch
06); the domain numa affinity is just a pre-calculated union of the vcpu
affinities.
* The interface to this "soft affinity" is a bitmask of numa nodes, not
a bitmask of cpus.

If we're OK with that direction, then I think this patch series looks
pretty good.

Just to outline what the alternative would look like: The hypervisor would focus on the minimum mechanisms required to do something useful for NUMA systems. The domain NUMA affinity would be only used for memory allocation. vcpus would only have "hard" and "soft" affinities. The toolstack (libxl? xl?) would be responsible for stitching these together into a useable interface for NUMA: e.g., it would have the concept of "numa affinity" for vcpus (or indeed, virtual NUMA topologies), and would do things like update the domain NUMA affinity based on vcpu affinities.

This would mean the toolstack either assuming, when someone calls vcpu_set_node_affinity, that soft_affinity == numa_affinity, or keeping its own copy of numa_affinity for each vcpu around somewhere.

Alternately, we could punt on the NUMA interface altogether for this patch series, and wait until we can implement a full-featured vNUMA interface. That is, for this patch series, make an interface just do NUMA affinity for memory, and "soft" and "hard" affinities for vcpus. Then in another series (perhaps one shortly after that), implement a full vNUMA interface, with vcpus mapped to vNUMA nodes, and vNUMA nodes mapped to pNUMA nodes -- with the toolstack implementing all of this just using the soft affinities.

Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.