Re: [xen-devel][vNUMA v2][PATCH 2/8] public interface
On Tue, Aug 3, 2010 at 8:52 AM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> On 03/08/2010 16:43, "Dulloor" <dulloor@xxxxxxxxx> wrote:
>>> I would expect guest would see nodes 0 to nr_vnodes-1, and the mnode_id
>>> could go away.
>> mnode_id maps the vnode to a particular physical node. This will be
>> used by balloon driver in
>> the VMs when the structure is passed as NUMA enlightenment to PVs and
>> PV on HVMs.
>> I have a patch ready for that (once we are done with this series).
> So what happens when the guest is migrated to another system with different
> physical node ids? Is that never to be supported? I'm not sure why you
> wouldn't hide the vnode-to-mnode translation in the hypervisor.
Right now, migration is not supported when NUMA strategy is set.
This is in my TODO list (along with PoD support).
There are a few open questions wrt migration :
- What if the destination host is not NUMA, but the guest is NUMA. Do we fake
those nodes ? Or, should we not select such a destination host to begin with.
- What if the destination host is not NUMA, but guest has asked to be
a specific number of nodes (possibly for higher aggregate memory bandwidth) ?
- What if the guest has asked for a particular memory strategy
but the destination host can't guarantee that (because of the
distribution of free memory
across the nodes) ?
Once we answer these questions, we will know whether vnode-to-mnode
translation is better
exposed or not. And, if exposed, could we just renegotiate the
vnode-to-mnode translation at the
destination host. I have started working on this. But, I have some
other patches ready to go
which we might want to check-in first - PV/Dom0 NUMA patches,
Ballooning support (see below).
As such, the purpose of vnode-to-mnode translation is for the enlightened
guests to know where their underlying memory comes from, so that
like ballooning are given a chance to maintain this distribution. This
way all that the hypervisor
cares about is to do sanity checks on increase/exchange reservation
requests from the guests
and the guest can decide whether to make an exact_node_request or not.
Other options which would allow us to discard this translation are :
- Ballooning at your risk : Let ballooning be as it is even when
guests use a numa strategy(particularly split/confined).
- Hypervisor-level policies : Let Xen do its best to maintain the
guest nodes (using gpfn ranges in guest nodes), which I think
is not a clean/flexible solution.
But, what I could do is to leave out vnode_to_mnode translation for
now and add it along with ballooning support
(if/when we decide to add it). I will just bump up the interface
version at that time. That might give us time to mull this over ?
> -- Keir
Xen-devel mailing list