xen-devel

[Top] [All Lists]

Re: [Xen-devel] [PATCH] libxc: succeed silently on restore

from [Ian Jackson]

[Permanent Link][Original]

To:	Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>
Subject:	Re: [Xen-devel] [PATCH] libxc: succeed silently on restore
From:	Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
Date:	Thu, 2 Sep 2010 18:07:45 +0100
Cc:	Ian, Brendan Cully <brendan@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jackson <Ian.Jackson@xxxxxxxxxxxxx>
Delivery-date:	Thu, 02 Sep 2010 10:08:18 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<1283446919.12544.9877.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<5ad37819cddd19a27065.1283444083@xxxxxxxxxxxxxxxxxxxxx> <1283446919.12544.9877.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Ian Campbell writes ("Re: [Xen-devel] [PATCH] libxc: succeed silently on 
restore"):
> I'm not so sure what can be done about this case, the way
> xc_domain_restore is (currently) designed it relies on the saving end to
> close its FD when it is done in order to generate an EOF at the receiver
> end to signal the end of the migration.

This was introduced in the Remus patches and is IMO not correct.

> The xl migration protocol has a postamble which prevents us from closing
> the FD and so instead what happens is that the sender finishes the save
> and then sits waiting for the ACK from the receiver so the receiver hits
> the remus heartbeat timeout which causes us to continue. This isn't
> ideal from the downtime point of view nor from just a general design
> POV.

The xl migration protocol postamble is needed to try to mitigate the
consequences of network failure, where otherwise it is easy to get
into situations where neither the sender nor the receiver can safely
resume the domain.

> Perhaps we should insert an explicit done marker into the xc save
> protocol which would be appended in the non-checkpoint case? Only the
> save end is aware if the migration is a checkpoint or not (and only
> implicitly via callbacks->checkpoint <> NULL) but that is OK, I think.

There _is_ an explicit done marker: the sender stops sending pages and
sends a register dump.  It's just that remus then wants to continue
anyway.

The solution is that the interface to xc_domain_restore should be
extended so that:
 * Callers specify whether they are expecting a series of checkpoints,
   or just one.
 * When it returns you find out whether the response was "we got
   exactly the one checkpoint you were expecting" or "the network
   connection failed too soon" or "we got some checkpoints and then
   the network connection failed".

A related problem is that it is very difficult for the caller to
determine when the replication has been properly set up: ie, to know
when the receiver has got at least one whole checkpoint.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson <= Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Brendan Cully Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Brendan Cully Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson

Previous by Date:	Re: [Xen-devel] Process irqbalance: BUG: unable to handle kernel paging request - with 2.6.32.18 pv-ops & xen 4.0.1, Jeremy Fitzhardinge
Next by Date:	Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Jackson
Previous by Thread:	Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell
Next by Thread:	Re: [Xen-devel] [PATCH] libxc: succeed silently on restore, Ian Campbell
Indexes:	[Date] [Thread] [Top] [All Lists]