[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 06/13] libxl: convert libxl_device_disk_add to an async op



Ian Jackson wrote:
Roger Pau Monne writes ("[PATCH v6 06/13] libxl: convert libxl_device_disk_add to an 
async op"):
This patch converts libxl_device_disk_add to an ao operation that
waits for device backend to reach state XenbusStateInitWait and then
marks the operation as completed. This is not really useful now, but
will be used by latter patches that will launch hotplug scripts after
                   ^^^^^^
later

Thanks

we reached the desired xenbus state.

Thanks, here are my review comments:


diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 85c21b4..937ab08 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
...
+/* AO operation to connect a disk device, called by
+ * libxl_device_disk_add and libxl__add_disks. This function calls
+ * libxl__initiate_device_connection to wait for the device to
+ * finish the connection (might involve executing hotplug scripts).
+ */
+_hidden void libxl__device_disk_add(libxl__egc *egc, uint32_t domid,
+                                    xs_transaction_t t,
+                                    libxl_device_disk *disk,
+                                    libxl__ao_device *aodev);
+

This is a confusing doc comment.  The reader doesn't want to know how
the function is implemented; we need to know what it does.
Particularly, we need to know what happens when it completes.  I infer
that it calls aodev->callback but this should be stated.

Yes, fixed comment.

+/* Waits for the passed device to reach state XenbusStateInitWait.
+ * This is not really useful by itself, but is important when executing
+ * hotplug scripts, since we need to be sure the device is in the correct
+ * state before executing them.
+ *
+ * Once finished, aodev->callback will be executed.
+ */
+_hidden void libxl__initiate_device_connection(libxl__egc*,
+                                               libxl__ao_device *aodev);

This function's name and its functionality seem not to correspond.  It
doesn't actually initiate anything, as far as I can tell.

I've renamed it to libxl__wait_device_connection

  /* First layer; wraps libxl__spawn_spawn. */
@@ -2033,6 +2068,8 @@ typedef struct {
      libxl__domain_build_state dm_state;
      libxl__dm_spawn_state pvqemu;
      libxl__destroy_domid_state dis;
+    /* used to store the state of devices being connected */
+    libxl__ao_devices aodevs;
  } libxl__stub_dm_spawn_state;

I'm not sure how valuable these comments are.  libxl__ao_devices are
always used to store the state of devices being "<something>'d".
That's what a libxl__ao_devices is for.  And in the context of a spawn
or create that would obviously be "connected".

In general I'm a fan of comments which say things which aren't clear
from the code, but I'm not keen on ones which restate what the code
says.

I've removed both occurrences of this comment.


diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 9243806..5d34ed5 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -442,6 +442,45 @@ void libxl__ao_devices_callback(libxl__egc *egc, 
libxl__ao_device *aodev)
      return;
  }

+/******************************************************************************/
+
+/* Macro for defining the functions that will add a bunch of disks when
+ * inside an async op.
+ *
+ * This macro is added to prevent repetition of code.
+ */
+
+/* Capatibility macro, since disk_add now takes a xs transaction parameter */
+#define libxl__device_disk_add(egc, domid, disk, aodev)                       \
+        libxl__device_disk_add(egc, domid, XBT_NULL, disk, aodev)

Is it really safe to call libxl__device_disk_add without a real
transaction ?  I see that the current code does it but it seems wrong
to me.  We end up writing all of the individual settings in the
backend one at a time.

Well, this is not really my code, this was part of Stefanos series about local disk attach. However, I don't see how we end up writing them one at a time, libxl__device_generic_add creates a transaction if none is given, so all backend and frontend entries are added at the same transaction. Calling libxl__device_disk_add without a transaction behaves just like it did previously.

I think this applies to all the other device types too.  So I think
the right answer would be to make them take a transaction parameter
too.  Ie, insert a patch before this one which adds a transaction
parameter (ignored for now and always passed 0 if you don't want to
actually make it work properly) to libxl__add_nic etc.  Then you don't
need this unpleasant macro.

Ok, I will add this patch that makes all device_*_add take a transaction parameter, although it will be ignored right now.

+void libxl__initiate_device_connection(libxl__egc *egc, libxl__ao_device 
*aodev)
+{
+    STATE_AO_GC(aodev->ao);
+    char *be_path = libxl__device_backend_path(gc, aodev->dev);
+    char *state_path = libxl__sprintf(gc, "%s/state", be_path);
+    int rc = 0;
...
+out_fail:
+    assert(rc);
+    aodev->rc = rc;
+    device_xsentries_remove(egc, aodev);
+    return;
+}

Firstly, I'm not convinced it's really proper for
libxl__initiate_device_connection to call device_xsentries_remove.
After all device_xsentries_remove is part of the implementation of
libxl__initiate_device_remove.

This is part of both flows, both device connection and disconnection.

And, secondly, does device_xsentries_remove really do what you want ?
It has a test in it which makes it only do its work if action is
DISCONNECT.

Yes, that's because it's called by both flows.


diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 43affd1..5f09740 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c


@@ -2027,15 +2046,17 @@ static char * libxl__alloc_vdev(libxl__gc *gc, const 
char *blkdev_start,

+                dls->t = xs_transaction_start(ctx->xsh);
+                if (dls->t == XBT_NULL) {
+                    LOG(ERROR, "failed to start a xenstore transaction");
+                    rc = ERROR_FAIL;
+                    goto out;
+                }
+                disk->vdev = libxl__alloc_vdev(gc, blkdev_start, dls->t);
...
+                libxl__device_disk_add(egc, LIBXL_TOOLSTACK_DOMID,
+                                       dls->t, disk,&dls->aodev);
...
+    xs_ret = xs_transaction_end(ctx->xsh, dls->t, 0);
+    if (xs_ret == 0&&  errno == EAGAIN) {
+        libxl__device_disk_local_attach(egc, dls);
+        return;

Isn't the effect of this that if the xs transaction gets a conflict,
we'll rerun the hotplug scripts, etc. ?  I think I may be confused
here, but I don't understand how this transaction loop is supposed to
work and how it is supposed to relate to the interaction with other
domains, scripts, etc.

Yes I see your point. We should disconnect the device (execute hotplug scripts) but since the xenstore entries are already gone (because the transaction is not committed successfully) I don't see anyway to do it, we cannot execute those scripts if the backend entries have been lost.

Indeed it seems to me like your arrangements in libxl__device_disk_add
couldn't work if a non-null t was supplied.  libxl__device_disk_add
would do all the writes in the transaction, but not commit it, so that
none of them are visible to anyone, and then wait for the deivde state
to change, which will never happen.

I'm not so sure, is it possible to add a watch to a xenstore path that is inside of a not yet committed transaction? If so this will work fine, the transaction will be finished correctly, and the path will be updated to the correct state (2).

If the transaction is not committed successfully, we will get a timeout from the devstate event and exit.


  static void libxl__device_disk_from_xs_be(libxl__gc *gc,
@@ -1982,11 +1999,13 @@ int libxl_cdrom_insert(libxl_ctx *ctx, uint32_t domid, 
libxl_device_disk *disk)
      ret = 0;

      libxl_device_disk_remove(ctx, domid, disks + i, 0);
-    libxl_device_disk_add(ctx, domid, disk);
+    /* fixme-ao */
+    libxl_device_disk_add(ctx, domid, disk, 0);
      stubdomid = libxl_get_stubdom_id(ctx, domid);
      if (stubdomid) {
          libxl_device_disk_remove(ctx, stubdomid, disks + i, 0);
-        libxl_device_disk_add(ctx, stubdomid, disk);
+        /* fixme-ao */
+        libxl_device_disk_add(ctx, stubdomid, disk, 0);

I think this code is a symptom of a problem elsewhere.  When adding a
disk to a domain, it should not be necessary to explicitly ask to add
it to the stubdom too.  But this is not your fault, or your problem
right now.

So I assume we can leave this for later.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.