[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3 of 5 V2] tools/libxl: setup/teardown Remus network buffering [and 1 more messages]

Shriram Rajagopalan writes ("Re: [Xen-devel] [PATCH 3 of 5 V2] tools/libxl: 
setup/teardown Remus network buffering"):
> On Wed, Sep 4, 2013 at 12:16 PM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> 
> wrote:
>     > diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c
>     > --- /dev/null Thu Jan 01 00:00:00 1970 +0000
>     > +++ b/tools/libxl/libxl_netbuffer.c   Thu Aug 29 14:36:36 2013 -0700
>     > @@ -0,0 +1,434 @@
>     ...
>     > +static void netbuf_script_child_death_cb(libxl__egc *egc,
>     > +                                         libxl__ev_child *child,
>     > +                                         pid_t pid, int status)
>     > +{
>     > +    /* No-op. hotplug-error will be read in setup/teardown_ifb */
>     > +    return;
>     > +}
>     This can't possibly be right.  You must always wait for your children
>     and you may not complete the ao until the child has finished.  This
>     should all be documented in libxl_internal.h.
> Thats true. But this was not meant to be an ao (see explanation at
> end of this email).

It is not possible to fork/exec/wait in libxl without using the ao
machinery.  As the comment in libxl_internal.h says:
  * For making subprocesses.  This is the only permitted mechanism for
  * code in libxl to do so.
 _hidden pid_t libxl__ev_child_fork(libxl__gc *gc, ...

(This is due to awkwardness in the POSIX API, combined with libxl's
status as a library which needs to work inside a variety of other
programs and libraries with a variety of ways of handling child

>     > +    /* Now wait for the result (XENBUS_PATH/hotplug-status).
>     > +     * It doesnt matter what the result is. If the hotplug-status path
>     > +     * appears, then we are good to go.
>     > +     */
>     > +    rc = libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIMEOUT,
>     > +                                   script,
>     > +                                   /* path */
>     > +                                   GCSPRINTF("%s/hotplug-status",
>     > +                                             out_path_base),
>     > +                                   NULL /* state */,
>     > +                                   NULL, script_done_cb, NULL);
>     This wait does not actually block the current execution.
> It actually does. Atleast based on the code in libxl_exec.c which uses
> a select() call to wait on xenstore fds.

Oh, I'm wrong.  But, libxl__wait_for_offspring is a special tool for
dealing with the daemonic device model process.  The comment should be

>     Does this run a script per netbuf ?  Do they want to run in
>     parallel ?
> A script per netbuf. Running them in parallel is an overkill, since
> the remus setup is a one-time task and the average domain is
> expected to have few interfaces.  Besides, the script has no
> waits. It just issues a bunch of shell commands that don't block

Fair enough.  But you may find it easier to run them in parallel
because perhaps then you can reuse more of the existing machinery for
running multiple ao subtasks.

> See answer above for running in parallel. In terms of running this
> asynchronously, I had a comment in this file explaining why I
> decided to go the synchronous route.  Basically, the setup and
> teardown are part of the remus API call, which is already
> asynchronous.
> The remus external API (libxl_domain_remus_start) returns after
> starting remus.  It goes like this:
>   setup scripts
>   suspend_domain (which basically launches save_domain helper process)
>   return AO_IN_PROGRESS


> The setup scripts are have no long running operations. They complete in jiffy
> (adding a bunch of tc rules to interfaces)

But in libxl you can't wait for subprocesses (ie, you can't call
waitpid) other than via the ao machinery.

> The AO callback in libxl.c (named "remus_replication_failure_cb" [beginning 
> of this patch]), is called when the backup fails, i.e. when the replication 
> operations fail (such as write() calls).
> The AO callback calls the above teardown function.  I remember reading in the
> comments that an AO op should not be issued from an AO callback 
> (please correct me if I am wrong).

You can't run an ao_how-taking libxl function from within libxl.  But
you may (and should) provide a plain-callback-style version of the
same operation from use within libxl.

To see some examples of how to do subfunctions in the asynchronous
style, see libxl_create.c - for example, its call to


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.