[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3 of 5 V2] tools/libxl: setup/teardown Remus network buffering



Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of 5 V2] tools/libxl: 
setup/teardown Remus network buffering"):
> tools/libxl: setup/teardown Remus network buffering

Thanks.

I'm afraid the event handling, particularly with respect to
subprocesses, is not yet right.

> diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c Thu Aug 29 14:36:25 2013 -0700
> +++ b/tools/libxl/libxl_dom.c Thu Aug 29 14:36:36 2013 -0700
> @@ -1259,7 +1259,7 @@ static void remus_checkpoint_dm_saved(li
>      /* REMUS TODO: Wait for disk and memory ack, release network buffer */
>      /* REMUS TODO: make this asynchronous */
>      assert(!rc); /* REMUS TODO handle this error properly */
> -    usleep(dss->interval * 1000);
> +    usleep(dss->remus_ctx->interval * 1000);
>      libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);

Do you plan to fix this later (perhaps as part of a future series) ?

> diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> +++ b/tools/libxl/libxl_netbuffer.c   Thu Aug 29 14:36:36 2013 -0700
> @@ -0,0 +1,434 @@
...
> +static void netbuf_script_child_death_cb(libxl__egc *egc,
> +                                         libxl__ev_child *child,
> +                                         pid_t pid, int status)
> +{
> +    /* No-op. hotplug-error will be read in setup/teardown_ifb */
> +    return;
> +}

This can't possibly be right.  You must always wait for your children
and you may not complete the ao until the child has finished.  This
should all be documented in libxl_internal.h.

Also I don't see where this other error handling code is.

> +    if (!pid) {
> +        /* child: Launch netbuf script */
> +        libxl__exec(gc, -1, -1, -1, args[0], args, env);
> +        /* notreached */
> +        abort();
> +    }
> +
> +    assert(libxl__ev_child_inuse(&childw_out));

What purpose does this serve ?  This can be false if the death_cb is
called from within libxl__ev_child_fork.

> +    rc = libxl__exec_netbuf_script(gc, domid, devid, vif,
> +                                   script, "setup");
> +    if (rc) return rc;

This definitely isn't right - it needs to be asynchronous, with a
callback.

> +    /* Now wait for the result (XENBUS_PATH/hotplug-status).
> +     * It doesnt matter what the result is. If the hotplug-status path
> +     * appears, then we are good to go.
> +     */
> +    rc = libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIMEOUT,
> +                                   script,
> +                                   /* path */
> +                                   GCSPRINTF("%s/hotplug-status",
> +                                             out_path_base),
> +                                   NULL /* state */,
> +                                   NULL, script_done_cb, NULL);

This wait does not actually block the current execution.  It arranges
for the callback to be called.  You need to break this function apart.

> +static int libxl__netbuf_script_teardown(libxl__gc *gc, uint32_t domid,
> +                                         int devid, char *vif,
> +                                         char *script)
> +{
> +    /* Nothing to wait for during tear down */
> +    return libxl__exec_netbuf_script(gc, domid, devid, vif,
> +                                     script, "teardown");

And this one.

> +    for (i = 0; i < num_netbufs; ++i) {
> +
> +        /* The setup script does the following
> +         * find a free IFB to act as buffer for the vif.
> +         * set up the plug qdisc on the IFB.
> +         */
> +        ret = libxl__netbuf_script_setup(gc, domid, i, vif_list[i],
> +                                         (char *) remus_ctx->netbufscript,
> +                                         &ifb_list[i]);

Does this run a script per netbuf ?  Do they want to run in
parallel ?

> +int libxl__remus_netbuf_teardown(libxl__gc *gc, uint32_t domid,
> +                              libxl__remus_ctx *remus_ctx)
> +{
...
> +    for (i = 0; i < netbuf_ctx->num_netbufs; ++i)
> +        libxl__netbuf_script_teardown(gc, domid, i, vif_list[i],
> +                                      (char *) remus_ctx->netbufscript);

I think you should definitely do this asynchronously and in parallel.

What happens if I use "xl destroy" to destroy a domain which is being
handled by remus.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.