|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] libxl/devd: fix a race with concurrent device addition/removal
On Wed, May 10, 2017 at 11:47:31AM +0100, Wei Liu wrote:
> On Wed, May 10, 2017 at 11:43:57AM +0100, Roger Pau Monne wrote:
> > On Wed, May 10, 2017 at 11:32:45AM +0100, Ian Jackson wrote:
> > > Roger Pau Monne writes ("[PATCH 1/2] libxl/devd: fix a race with
> > > concurrent device addition/removal"):
> > > > Current code can free the libxl__device inside of the
> > > > libxl__ddomain_device
> > > > before the addition has finished if a removal happens while an addition
> > > > is
> > > > still in process:
> > > >
> > > > backend_watch_callback
> > > > |
> > > > v
> > > > add_device
> > > > | backend_watch_callback
> > > > (async operation) |
> > > > | v
> > > > | remove_device
> > > > | |
> > > > | V
> > > > | device_complete
> > > > | (free libxl__device)
> > > > v
> > > > device_complete
> > > > (deref libxl__device)
> > > >
> > > > Fix this by creating a temporary copy of the libxl__device, that's
> > > > tracked by the GC of the nested async operation. This ensures that
> > > > the libxl__device used by the async operations cannot be freed while
> > > > being used.
> > >
> > > Doesn't this arrange that the remove hotplug script will be invoked
> > > while the add hotplug script is still running ?
> >
> > That's indeed possible (either with the current code or with this patch),
> > although unlikely. The async code called by remove_device will wait for the
> > backend to switch to state 6, while the add_device code will wait for state
> > 2
> > IIRC (one can change those states to make them clash probably).
> >
> > > Is that really desirable (or allowed!) ?
> >
> > Hm, no, I don't think it's desirable at all. I still think this is better
> > that
> > the previous code (at lest it doesn't dereference libxl__device anymore),
> > but
> > clearly needs further improvements.
> >
> > Also, it seems to me the same can happen even without driver domains, if a
> > user
> > executes concurrent block-{attach/detach} operations, but maybe I'm missing
> > something?
> >
>
> There is a lot of locking for all the device add / remove code. See
> libxl_internal.h L2588.
It's been a long time since I've played with libxl, last time none of this
existed, sorry. Sadly devd completely bypasses all this, a simple way to fix
this would be to call libxl__lock_domain_userdata from {add/remove}_device
maybe? (and drop the lock at device_complete).
Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |