[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 07 of 10] Add new shutdown mode for checkpoint

Hi Keir,

Thanks for looking over these patches. I've updated my patch set
according to your comments, and I'll send them a long in a separate
thread. But I've replied to your comments below.

On Thursday, 28 December 2006 at 16:51, Keir Fraser wrote:
> On 15/12/06 6:38 am, "Brendan Cully" <brendan@xxxxxxxxx> wrote:
> > Add new shutdown mode for checkpoint.
> > 
> > When control/shutdown = checkpoint, invoke an alternate suspend path
> > that doesn't disconnect from back ends, and only reconnects when the
> > image has been restored into a new domain.
> I don't think a new type of 'checkpoint' handler is required in the guest
> OS. We are already most of the way there in terms of doing as little as
> possible on the suspend side of save/restore, so we should fix up what
> little else there is to be done. Looking at the differences versus your new
> checkpointing suspend:
>  1. Xenbus_suspend() needs to stay. Actually most drivers do not have a
> suspend handler anyway (only tpmfront does). We should provide a
> suspend_cancelled() hook callback so that drivers which *do* have a suspend
> handler can distinguish between proper resume and checkpoint return.

Hmm. I didn't see the tpmfront driver in my
linux-2.6-xen-sparse/drivers/xen directory, and I'm not sure what the
in-guest suspend code is supposed to do (or how it would roll
back). I've left the suspend code active in the new patch set, but not
yet added the cancellation hook, since there's nothing to use it or
test it now anyway. I've also disabled the device resume hook in the
source resume path: that code tears down the existing backend
connection and waits for a new one to be set up, which never happens
in the source domain. How would you prefer that this work?

>  2. I don't think we really need to xs_unwatch() all our watches on
> xs_suspend(). Probably that code can just go.

ok, done.

>  3. Keep gnttab_suspend(). It isn't really that slow to execute and avoids
> needing other hacks.

ok, done.

>  4. Keep pre_suspend() and don't have special pre_checkpoint(). Again, it is
> fairly cheap to clear/renew the shared_info mapping.

ok, done.

>  5. It would be nice to have a backward-compatible way for the guest to tell
> the tools that its suspension is cancellable. For this we could write an
> informative string into xen_start_info->magic[]. Notifying the guest of

as I understood it, xen_start_info (the suspend record) isn't
available to the save code until the guest passes it along as the
argument to the suspend hypercall. Isn't this too late to be useful?
I suppose the guest could write some feature flag to xenstore though.

> suspend-cancel versus restore can be done via %eax return code. For example,
> 0==suspend-cancel, +ve==restore, -ve==error. Old tools will leave
> %eax==__HYPERVISOR_sched_op, which will correctly map to 'restore'.

I'm not sure I see the difference from the guest's point of view
between a cancelled suspend and an error. So I still only put 0 or 1
into eax.

> This allows us to use this cheap checkpoint framework to also provide easy
> cancellation of save/restore if something goes wrong (e.g., network
> connectivity fails during live migration).

Sure. As far as I can see, a cancellable save is just a checkpoint
where the parent is destroyed on commit.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.