[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]



On Wed, Apr 23, 2014 at 11:02 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
Yang Hongyang writes ("[PATCH V9 05/12] remus: remus device core and APIs to setup/teardown"):
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 33b62a2..421ae24 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2457,8 +2457,35 @@ typedef struct libxl__logdirty_switch {
> Â Â Âlibxl__ev_time timeout;
> Â} libxl__logdirty_switch;
>
> +typedef struct libxl__remus_state {
> + Â Âlibxl__domain_suspend_state *dss;
> + Â Âlibxl__egc *egc;
> +
> + Â Â/* private */
> + Â Âint saved_rc;
> + Â Â/* Opaque context containing device related stuff */
> + Â Âvoid *device_state;
> +} libxl__remus_state;

I'm afraid that the interface between the remus code and the rest of
the code is still not very clear.

Earlier, I wrote:

 > [...] ÂI wonder if it might not be better to provide a firmer
 > interface between the remus code and the rest of the save/restore
 > machinery. ÂThat is, have an explicit callback function recorded
 > by the save/restore code which is called back by the remus
 > machinery when it has done its work. ÂWhat do you think ?
 >
 > I think having the flow of control spring off into libxl_remus.c and
 > magically come back by libxl_remus.c knowing to call
 > domain_suspend_done is rather opaque.

I think you have basically two options:

1. Make the remus part of this be a fully self-contained standard
 Âasynchronous callback-based suboperation, like libxl__xswait,
 Âlibxl__bootloader, et al.

 ÂIn this case you should rigorously follow the existing patterns,
 Âdefining a clear interface between the two parts, providing a
 Âcallback function set by the caller, etc.

2. Integrate the remus part into the suspend/resume code in an
 Âad hoc fashion, with extremely clear comments everywhere about the
 Âexpected interface, and no extraneous moving parts.

At the moment you seem to have mixed these two approaches.

Sorry, I missed the previous comment of yours. The two options you
note are bit more clearer than the previous comment. And I also
agree that the current approach is mixing options 1 & 2.

The entire Remus code (executed from start to end) is one giant async op.Â
Internally, per checkpoint, the code executes for no more than tens of Â
milliseconds at max, with the exception of sleeping until the next checkpointÂ
needs to be taken.

Doing checkpoint related work (i.e., syscalls to control disk/network buffers)
in an async op is an overkill. So, they are integrated into the suspend/resume
infrastructure (option 2)

The async op is useful (please correct me if I am wrong) if the op runs forÂ
a long time, such that you don't want users of libxl to block. Which is exactly
why the setup/teardown and the sleep_until_next_checkpoint operations
are ao suboperations. (option 1).


All said, perhaps, it may be more clear to add a level of indirection:
Âmake domain_suspend_done a callback field in libxl__domain_state.

Change all direct callers of domain_suspend_done to invoke this callback.

In this way,
* domain_suspend -> domain_suspend_done (for normal suspend/save operations)
* domain_remus_start->domain_remus_terminated

What do you think?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.