[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2][PATCH 1/3] docs: design and intended usage for NUMA-aware ballooning



On 16/08/13 05:13, Yechen Li wrote:
> 
> +### nodemask VNODE\_TO\_PNODE(int vnode) ###
> +
> +This service is provided by the hypervisor (and wired, if necessary, all the
> +way up to the proper toolstack layer or guest kernel), since it is only Xen
> +that knows both the virtual and the physical topologies.

The physical NUMA topology must not be exposed to guests that have a
virtual NUMA topology -- only the toolstack and Xen should know the
mapping between the two.

A guest cannot make sensible use of a machine topology as it may be
migrated to a host with a different topology.

> +## Description of the problem ##
> +
> +Let us use an example. Let's assume that guest _G_ has a virtual 2 vnodes,
> +and that the memory for vnode #0 and #1 comes from pnode #0 and pnode #2,
> +respectively.
> +
> +Now, the user wants to create a new guest, but the system is under high 
> memory
> +pressure, so he decides to try ballooning _G_ down. He sees that pnode #2 has
> +the best chances to accommodate all the memory for the new guest, which would
> +be really good for performance, if only he can make space there. _G_ is the
> +only domain eating some memory from pnode, #2 but, as said above, not all of
> +its memory comes from there.

It is not clear to me that this is the optimal decision.  What
tools/information will be available that the user can use to make
sensible decisions here?  e.g., is the current layout available to tools?

Remember that the "user" in this example is most often some automated
process and not a human.

> +So, right now, the user has no way to specify that he wants to balloon down
> +_G_ in such a way that he will get as much as possible free pages from pnode
> +#2, rather than from pnode #0. He can ask _G_ to balloon down, but there is
> +no guarantee on from what pnode the memory will be freed.
> +
> +The same applies to the ballooning up case, when the user, for some specific
> +reasons, wants to be sure that it is the memory of some (other) specific 
> pnode
> +that will be used.

I would like to see some real world examples of cases where this is
sensible.

In general, I'm not keen on adding ABIs or interfaces that don't solve
real world problems, particularly if they're easy to misuse and end up
with something that is very suboptimal.

> +## NUMA-aware ballooning ##
> +
> +The new NUMA-aware ballooning logic works as follows.
> +
> +There is room, in libxl\_set\_memory\_target() for two more parameters, in
> +addition to the new memory target:

The Xenstore interface should be the primary interface being documented.
 The libxl interface is secondary and (probably) a consequence of the
xenstore interface.

> +* _pnid_ -- which is the pnode id of which node the user wants to try get 
> some
> +  free memory on
> +* _nodeexact_ -- which is a bool specifying whether or not, in case it is not
> +  possible to reach the new ballooning target only with memory from pnode
> +  _pnid_, the user is fine with using memory from other pnodes.  
> +  If _nodeexact_ is true, it is possible that the new target is not reached; 
> if
> +  it is false, the new target will (probably) be reached, but it is possible
> +  that some memory is freed on pnodes other than _pnid_.
> +
> +To let the ballooning driver know about these new parameters, a new xenstore
> +key exists in ~/memory/target\_nid. So, for a proper NUMA aware ballooning
> +operation to occur, the user should write the proper values in both the keys:
> +~/memory/target\_nid and ~/memory/target.

If we decide we do need such control, I think the xenstore interface
should look more like:

memory/target

  as before

memory/target-by-nid/0

  target for virtual node 0

...

memory/target-by-nid/N

  target for virtual node N

I think this better reflects the goal which is an adjusted NUMA layout
for the guest rather than the steps required to reach it (release P
pages from node N).

The balloon driver attempts to reach target, whist simultaneously trying
to reach the individual node targets.  It should prefer to balloon
up/down on the node that is furthest from its node target.

In cases where target and the sum of target-by-nid/N don't agree (or are
not present) the balloon driver should use target, and balloon up/down
evenly across all NUMA nodes.

Thew libxl interface does not necessarily have to match the xenstore
interface if that's the initial tools would prefer.

Finally a style comment, please avoid the use of a single gender
specific pronouns in documentation/comments (i.e., don't always use
he/his etc.).  I prefer to use a singular "they" but you could consider
"he or she" or using "he" for some examples and "she" in others.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.