On Thu, May 11, 2017 at 12:06:08PM +0100, Ian Jackson wrote:
Roger Pau Monne writes ("[PATCH v2 1/2] libxl/devd: fix a race with concurrent 
device addition/removal"):
Current code can free the libxl__device inside of the libxl__ddomain_device
before the addition has finished if a removal happens while an addition is
still in process:
...
Fix this by creating a temporary copy of the libxl__device, that's
tracked by the GC of the nested async operation. This ensures that
the libxl__device used by the async operations cannot be freed while
being used.
...
         GCNEW(aodev);
         libxl__prepare_ao_device(ao, aodev);
-        aodev->dev = dev;
+        /*
+         * Clone the libxl__device to avoid races if remove_device is called
+         * before the device addition has finished.
+         */
+        GCNEW(aodev->dev);
+        *aodev->dev = *dev;
This does conveniently disentangle the memory management, so I think
it's a good approach.
But it reads kind of oddly to me.  I think it is not buggy, but can
you add a comment to the definition of libxl__device, saying that it
is a transparent structure containing no external memory references ?
Sure, before implementing this I already took a look at the contents of the
libxl__device struct, but I agree that a comment is in place in case someone
expands the fields of the struct later on.
Otherwise this copy is not really justifiable, because in C, in
general, structs might contain private fields, or memory references or
linked list entries or something.
Thanks, Roger.
NB: FWIW, I'm planning to keep Wei's RB since this is a cosmetic change.