[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] libxl: create backend/ xenstore dir for driver domains



On Mon, Jan 06, 2020 at 05:03:40PM +0100, Marek Marczykowski-Górecki wrote:
> On Mon, Jan 06, 2020 at 03:40:22PM +0000, Ian Jackson wrote:
> > Adding Roger to the CC.
> > 
> > Marek Marczykowski-Górecki writes ("Re: [PATCH] libxl: create backend/ 
> > xenstore dir for driver domains"):
> > > On Mon, Jan 06, 2020 at 02:20:46PM +0000, Ian Jackson wrote:
> > > > Marek Marczykowski-Górecki writes ("[PATCH] libxl: create backend/ 
> > > > xenstore dir for driver domains"):
> > > > > Cleaning up backend xenstore entries is a responsibility of the 
> > > > > backend.
> > > > > When backend lives outside of dom0, the domain needs proper 
> > > > > permissions
> > > > > to do it. Normally it is given permission to remove the device dir
> > > > > itself, but not the dir containing it (named after frontend ID). 
> > > > > After a
> > > > > whole those empty leftover directories accumulate to the point 
> > > > > xenstore
> > > > > returning E2BIG on listing them.
> > > > > 
> > > > > Fix this by giving backend domain write access also to backend/
> > > > > directory itself when c_info->driver_domain option is set. The code
> > > > > removing relevant dir is already there (just lacked permissions to do 
> > > > > so).
> > > > > 
> > > > > Note this also allows the backend domain to create new entries,
> > > > > pretending to host backend devices it don't have. But since libxl uses
> > > > > /libxl/ xenstore dir for this information (still outside of backend
> > > > > domain control), this shouldn't be an issue.
> > > > 
> > > > This seems quite hazardous to me.  The reasoning you use to show that
> > > > this iws OK seems fragile, and in general it doesn't feel right to
> > > > give the particular backend such wide scope.
> > > > 
> > > > Can we find another way to address this problem ?  I think the
> > > > containing directory should be removed by the toolstack.  Why is this
> > > > difficult ?  (I presume there is a reason or you would have done it
> > > > that way...)
> > > 
> > > It was done this way previously and caused issues, see this commit:
> > > 
> > > commit 546678c6a60f64fb186640460dfa69a837c8fba5
> > > Author: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> > > Date:   Wed Sep 23 12:06:56 2015 +0200
> > > 
> > >     libxl: fix the cleanup of the backend path when using driver domains
> > 
> > Thanks.
> > 
> > >     With the current libxl implementation the control domain will
> > >     remove both the frontend and the backend xenstore paths of a
> > >     device that's handled by a driver domain. This is incorrect,
> > >     since the driver domain possibly needs to access the backend
> > >     path in order to perform the disconnection and cleanup of the
> > >     device.
> > >     
> > >     Fix this by making sure the control domain only cleans the
> > >     frontend path, leaving the backend path to be cleaned by the
> > >     driver domain. Note that if the device is not handled by a
> > >     driver domain the control domain will perform the removal of
> > >     both the frontend and the backend paths.
> > 
> > Hmm.  I see my Ack on that.  Nevertheless maybe it is wrong.
> > 
> > Looking at it afresh, I think maybe the right answer is:
> > 
> >  * If the driver domain is expected to be working properly, the
> >    toolstack should wait for the driver domain to complete the device
> >    shutdown, before removing the backend node.  Indeed, the toolstack
> >    ought to wait for this before actually destroying the guest in Xen,
> >    by the usual logic for clean domain shutdown.
> 
> I think that's not enough. .../state = 6 is set by the kernel, but
> xl devd in the driver domain may want to cleanup things (hotplug scripts
> etc). And indeed libxl__device_destroy() is called from
> device_hotplug_done(), not device_backend_callback().
> 
> Alternatively, toolstack could wait for the actual backend node to be
> removed (by the driver domain), and then cleanup the parent directory (if
> empty).

I'm not sure you need to cleanup the parent directory, albeit it
wouldn't hurt. It needs to be done in a transaction though, so that
you don't race with new additions to it.

> I don't find it particularly appealing, as every contact with
> libxl async code reduce overall happiness...
> 
> >  * There needs to be a way to deal with a broken/unresponsive driver
> >    domain.  That will involve not waiting for the backend so must
> >    involve simply deleting the backend from xenstore.
> 
> It's already there: if driver domain fails to set .../state = 6 within
> a timeout, toolstack will forcibly remove the entry.

Would it work to change this and instead of monitor .../state = 6
monitor that the parent directory still exist?

> > Is the distinction here between "xl shutdown" and "xl destroy", on the
> > actual guest domain, good enough ?  Hopefully if the driver domain
> > sees the backend directory simply vanish it can destructively tear
> > everything down ?
> 
> In the past this lead to multiple issues, where hotplug script didn't
> know which device actually was removed. In some cases I needed to
> workaround this by saving xenstore dump into a file in an "online"
> hotplug script, but it is very ugly solution.

Removing the whole directory without giving time to the driver domain
to execute it's hotplug scripts can indeed lead to issues, as there's
no guarantee that the hotplug script won't use data in xenstore in
order to perform the cleanup IIRC.

My preferred option would be to wait for the backend directory to be
removed by the driver domain, as I think it's the cleanest and likely
safest approach.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.