Re: [xen-devel][vNUMA v2][PATCH 2/8] public interface

To:	Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject:	Re: [xen-devel][vNUMA v2][PATCH 2/8] public interface
From:	Dulloor <dulloor@xxxxxxxxx>
Date:	Tue, 3 Aug 2010 10:24:58 -0700
Cc:	Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Tue, 03 Aug 2010 10:25:46 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=AlQxpPNO5lIAQYCK+GawW6ahto3r8j0tgXiZkC9KkoU=; b=Lmv74KtBL0lGSbHBOEfcrHACNCDITKgiLNPLR4uDL15iTydVRBZ9gS5JbW57ZAgReX lnQWFUSkXHCa1ELi9EXb3BwxjJT0artuRddqWZM4A80R1VO5R23kJU0T4VgzsqhJNzZ3 5m8ibxcxqF3PpXpeTOPGRzINbyPPVfQCLbW2s=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=mvBFM9tcT/eEUEWwaw3GZN3WKHBsnSfC44wwTCAnnJ4sofrZtiSFiDVo2ankAWvJbI xhAi9UE2oyzAw8BJuUKWLMTTeJeTYoQo3411nGU5tqnTXUpYdU799qDZSiVmYWZlsbjG Wk4SGFV66FGR9Ozo9JcioqgcH8lsMPgTYz52w=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<C87DF9E6.1C973%keir.fraser@xxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<AANLkTimKJogS0m2HN53KK-6_c-CnzBqqF0Udp8BFsRCh@xxxxxxxxxxxxxx> <C87DF9E6.1C973%keir.fraser@xxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On Tue, Aug 3, 2010 at 8:52 AM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> On 03/08/2010 16:43, "Dulloor" <dulloor@xxxxxxxxx> wrote:
>
>>> I would expect guest would see nodes 0 to nr_vnodes-1, and the mnode_id
>>> could go away.
>> mnode_id maps the vnode to a particular physical node. This will be
>> used by balloon driver in
>> the VMs when the structure is passed as NUMA enlightenment to PVs and
>> PV on HVMs.
>> I have a patch ready for that (once we are done with this series).
>
> So what happens when the guest is migrated to another system with different
> physical node ids? Is that never to be supported? I'm not sure why you
> wouldn't hide the vnode-to-mnode translation in the hypervisor.

Right now, migration is not supported when NUMA strategy is set.
This is in my TODO list (along with PoD support).

There are a few open questions wrt migration :
- What if the destination host is not NUMA, but the guest is NUMA. Do we fake
those nodes ? Or, should we not select such a destination host to begin with.
- What if the destination host is not NUMA, but guest has asked to be
striped across
a specific number of nodes (possibly for higher aggregate memory bandwidth) ?
- What if the guest has asked for a particular memory strategy
(split/confined/striped),
but the destination host can't guarantee that (because of the
distribution of free memory
across the nodes) ?
Once we answer these questions, we will know whether vnode-to-mnode
translation is better
exposed or not. And, if exposed, could we just renegotiate the
vnode-to-mnode translation at the
destination host. I have started working on this. But, I have some
other patches ready to go
which we might want to check-in first - PV/Dom0 NUMA patches,
Ballooning support (see below).

As such, the purpose of vnode-to-mnode translation is for the enlightened
guests to know where their underlying memory comes from, so that
over-provisioning features
like ballooning are given a chance to maintain this distribution. This
way all that the hypervisor
cares about is to do sanity checks on increase/exchange reservation
requests from the guests
and the guest can decide whether to make an exact_node_request or not.
Other options which would allow us to discard this translation are :
- Ballooning at your risk : Let ballooning be as it is even when
guests use a numa strategy(particularly split/confined).
- Hypervisor-level policies : Let Xen do its best to maintain the
guest nodes (using gpfn ranges in guest nodes), which I think
is not a clean/flexible solution.

But, what I could do is to leave out vnode_to_mnode translation for
now and add it along with ballooning support
(if/when we decide to add it). I will just bump up the interface
version at that time. That might give us time to mull this over ?

>
>  -- Keir
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [xen-devel][vNUMA v2][PATCH 2/8] public interface