[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1 of 2 V3] libxl: Remus - suspend/postflush/commit callbacks



On Fri, 2012-02-03 at 07:00 +0000, rshriram@xxxxxxxxx wrote:
> # HG changeset patch
> # User Shriram Rajagopalan <rshriram@xxxxxxxxx>
> # Date 1328251593 28800
> # Node ID 90e59c643c00c079996e13b75f89d1f0cd931a02
> # Parent  c7abecc14cceb18140335ebe20faad826282cd1f
> libxl: Remus - suspend/postflush/commit callbacks
> 
>  * Add libxl callback functions for Remus checkpoint suspend, postflush
>    (aka resume) and checkpoint commit callbacks.
>  * suspend callback is a stub that just bounces off
>    libxl__domain_suspend_common_callback - which suspends the domain and
>    saves the devices model state to a file.
>  * resume callback currently just resumes the domain (and the device model).
>  * commit callback just writes out the saved device model state to the
>    network and sleeps for the checkpoint interval.
>  * Introduce a new public API, libxl_domain_remus_start (currently a stub)
>    that sets up the network and disk buffer and initiates continuous
>    checkpointing.
> 
>  * Future patches will augument these callbacks/functions with more 
> functionalities

                        "augment"

>    like issuing network buffer plug/unplug commands, disk checkpoint 
> commands, etc.
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@xxxxxxxxx>
> 
> diff -r c7abecc14cce -r 90e59c643c00 tools/libxl/libxl.c
> --- a/tools/libxl/libxl.c     Thu Feb 02 22:46:33 2012 -0800
> +++ b/tools/libxl/libxl.c     Thu Feb 02 22:46:33 2012 -0800
> @@ -471,6 +471,41 @@ libxl_vminfo * libxl_list_vm(libxl_ctx *
>      return ptr;
>  }
>  
> +/* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
> +int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
> +                             uint32_t domid, int send_fd, int recv_fd)
> +{
> +    GC_INIT(ctx);
> +    libxl_domain_type type = libxl__domain_type(gc, domid);
> +    int rc = 0;
> +
> +    if (info == NULL) {
> +        LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
> +                   "No remus_info structure supplied for domain %d", domid);
> +        rc = ERROR_INVAL;
> +        goto remus_fail;
> +    }
> +
> +    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */

Is it worth checking that the domain has no disks or network (IOW is
this dangerous if they do?)

[...]
> @@ -791,7 +837,27 @@ int libxl__domain_suspend_common(libxl__
>      }
>  
>      memset(&callbacks, 0, sizeof(callbacks));
> -    callbacks.suspend = libxl__domain_suspend_common_callback;
> +    if (r_info != NULL) {
> +        /* save_callbacks:
> +         * suspend - called after expiration of checkpoint interval,
> +         *           to *suspend* the domain.
> +         *
> +         * postcopy - called after the domain's dirty pages have been
> +         *            copied into an output buffer. We *resume* the domain
> +         *            & the device model, return to the caller. Caller then
> +         *            flushes the output buffer, while the domain continues 
> to run.
> +         *
> +         * checkpoint - called after the memory checkpoint has been flushed 
> out
> +         *              into the network. Send the saved device state, *wait*
> +         *              for checkpoint ack and *release* the network buffer 
> (TBD).
> +         *              Then *sleep* for the checkpoint interval.
> +         */

I think this comment would be more useful in xenguest.h next to the
callback struct.

Otherwise the patch looks good.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.