[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2][PATCH 1/3] docs: design and intended usage for NUMA-aware ballooning



On Mon, Aug 19, 2013 at 1:58 PM, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
> On 16/08/13 05:13, Yechen Li wrote:
>>
>> +### nodemask VNODE\_TO\_PNODE(int vnode) ###
>> +
>> +This service is provided by the hypervisor (and wired, if necessary, all the
>> +way up to the proper toolstack layer or guest kernel), since it is only Xen
>> +that knows both the virtual and the physical topologies.
>
> The physical NUMA topology must not be exposed to guests that have a
> virtual NUMA topology -- only the toolstack and Xen should know the
> mapping between the two.
>
> A guest cannot make sensible use of a machine topology as it may be
> migrated to a host with a different topology.
>
>> +## Description of the problem ##
>> +
>> +Let us use an example. Let's assume that guest _G_ has a virtual 2 vnodes,
>> +and that the memory for vnode #0 and #1 comes from pnode #0 and pnode #2,
>> +respectively.
>> +
>> +Now, the user wants to create a new guest, but the system is under high 
>> memory
>> +pressure, so he decides to try ballooning _G_ down. He sees that pnode #2 
>> has
>> +the best chances to accommodate all the memory for the new guest, which 
>> would
>> +be really good for performance, if only he can make space there. _G_ is the
>> +only domain eating some memory from pnode, #2 but, as said above, not all of
>> +its memory comes from there.
>
> It is not clear to me that this is the optimal decision.  What
> tools/information will be available that the user can use to make
> sensible decisions here?  e.g., is the current layout available to tools?
>
> Remember that the "user" in this example is most often some automated
> process and not a human.
>
>> +So, right now, the user has no way to specify that he wants to balloon down
>> +_G_ in such a way that he will get as much as possible free pages from pnode
>> +#2, rather than from pnode #0. He can ask _G_ to balloon down, but there is
>> +no guarantee on from what pnode the memory will be freed.
>> +
>> +The same applies to the ballooning up case, when the user, for some specific
>> +reasons, wants to be sure that it is the memory of some (other) specific 
>> pnode
>> +that will be used.
>
> I would like to see some real world examples of cases where this is
> sensible.
>
> In general, I'm not keen on adding ABIs or interfaces that don't solve
> real world problems, particularly if they're easy to misuse and end up
> with something that is very suboptimal.

I think at very least the guest needs to be able to say, "allocate me
a page from vnode X", and have Xen translate that into pnode
internally, so that ballooning down and back up again doesn't destroy
a guest's NUMA memory affinity (e.g., the vnode->pnode memory
mapping).

[snip]

>
>> +* _pnid_ -- which is the pnode id of which node the user wants to try get 
>> some
>> +  free memory on
>> +* _nodeexact_ -- which is a bool specifying whether or not, in case it is 
>> not
>> +  possible to reach the new ballooning target only with memory from pnode
>> +  _pnid_, the user is fine with using memory from other pnodes.
>> +  If _nodeexact_ is true, it is possible that the new target is not 
>> reached; if
>> +  it is false, the new target will (probably) be reached, but it is possible
>> +  that some memory is freed on pnodes other than _pnid_.
>> +
>> +To let the ballooning driver know about these new parameters, a new xenstore
>> +key exists in ~/memory/target\_nid. So, for a proper NUMA aware ballooning
>> +operation to occur, the user should write the proper values in both the 
>> keys:
>> +~/memory/target\_nid and ~/memory/target.
>
> If we decide we do need such control, I think the xenstore interface
> should look more like:
>
> memory/target
>
>   as before
>
> memory/target-by-nid/0
>
>   target for virtual node 0
>
> ...
>
> memory/target-by-nid/N
>
>   target for virtual node N
>
> I think this better reflects the goal which is an adjusted NUMA layout
> for the guest rather than the steps required to reach it (release P
> pages from node N).

This seems more sensible than a mask (as Jan suggested); but is it
open to race conditions?

>
> The balloon driver attempts to reach target, whist simultaneously trying
> to reach the individual node targets.  It should prefer to balloon
> up/down on the node that is furthest from its node target.
>
> In cases where target and the sum of target-by-nid/N don't agree (or are
> not present) the balloon driver should use target, and balloon up/down
> evenly across all NUMA nodes.
>
> Thew libxl interface does not necessarily have to match the xenstore
> interface if that's the initial tools would prefer.
>
> Finally a style comment, please avoid the use of a single gender
> specific pronouns in documentation/comments (i.e., don't always use
> he/his etc.).  I prefer to use a singular "they" but you could consider
> "he or she" or using "he" for some examples and "she" in others.

Doing half and half seems a bit strange to me; if we're trying for
gender equity, I'd just go for "she" all the way.  There are enough
"he"s in the wider literature to more than balance it out for many
years to come. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.