[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] libxl: create backend/ xenstore dir for driver domains



On Mon, Jan 06, 2020 at 05:03:40PM +0100, Marek Marczykowski-Górecki wrote:
> On Mon, Jan 06, 2020 at 03:40:22PM +0000, Ian Jackson wrote:
> > Adding Roger to the CC.
> > 
> > Marek Marczykowski-Górecki writes ("Re: [PATCH] libxl: create backend/ 
> > xenstore dir for driver domains"):
> > > On Mon, Jan 06, 2020 at 02:20:46PM +0000, Ian Jackson wrote:
> > > > Marek Marczykowski-Górecki writes ("[PATCH] libxl: create backend/ 
> > > > xenstore dir for driver domains"):
> > > > > Cleaning up backend xenstore entries is a responsibility of the 
> > > > > backend.
> > > > > When backend lives outside of dom0, the domain needs proper 
> > > > > permissions
> > > > > to do it. Normally it is given permission to remove the device dir
> > > > > itself, but not the dir containing it (named after frontend ID). 
> > > > > After a
> > > > > whole those empty leftover directories accumulate to the point 
> > > > > xenstore
> > > > > returning E2BIG on listing them.
> > > > > 
> > > > > Fix this by giving backend domain write access also to backend/
> > > > > directory itself when c_info->driver_domain option is set. The code
> > > > > removing relevant dir is already there (just lacked permissions to do 
> > > > > so).
> > > > > 
> > > > > Note this also allows the backend domain to create new entries,
> > > > > pretending to host backend devices it don't have. But since libxl uses
> > > > > /libxl/ xenstore dir for this information (still outside of backend
> > > > > domain control), this shouldn't be an issue.
> > > > 
> > > > This seems quite hazardous to me.  The reasoning you use to show that
> > > > this iws OK seems fragile, and in general it doesn't feel right to
> > > > give the particular backend such wide scope.
> > > > 
> > > > Can we find another way to address this problem ?  I think the
> > > > containing directory should be removed by the toolstack.  Why is this
> > > > difficult ?  (I presume there is a reason or you would have done it
> > > > that way...)
> > > 
> > > It was done this way previously and caused issues, see this commit:
> > > 
> > > commit 546678c6a60f64fb186640460dfa69a837c8fba5
> > > Author: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> > > Date:   Wed Sep 23 12:06:56 2015 +0200
> > > 
> > >     libxl: fix the cleanup of the backend path when using driver domains
> > 
> > Thanks.
> > 
> > >     With the current libxl implementation the control domain will
> > >     remove both the frontend and the backend xenstore paths of a
> > >     device that's handled by a driver domain. This is incorrect,
> > >     since the driver domain possibly needs to access the backend
> > >     path in order to perform the disconnection and cleanup of the
> > >     device.
> > >     
> > >     Fix this by making sure the control domain only cleans the
> > >     frontend path, leaving the backend path to be cleaned by the
> > >     driver domain. Note that if the device is not handled by a
> > >     driver domain the control domain will perform the removal of
> > >     both the frontend and the backend paths.
> > 
> > Hmm.  I see my Ack on that.  Nevertheless maybe it is wrong.
> > 
> > Looking at it afresh, I think maybe the right answer is:
> > 
> >  * If the driver domain is expected to be working properly, the
> >    toolstack should wait for the driver domain to complete the device
> >    shutdown, before removing the backend node.  Indeed, the toolstack
> >    ought to wait for this before actually destroying the guest in Xen,
> >    by the usual logic for clean domain shutdown.
> 
> I think that's not enough. .../state = 6 is set by the kernel, but
> xl devd in the driver domain may want to cleanup things (hotplug scripts
> etc). And indeed libxl__device_destroy() is called from
> device_hotplug_done(), not device_backend_callback().
> 
> Alternatively, toolstack could wait for the actual backend node to be
> removed (by the driver domain), and then cleanup the parent directory (if
> empty). I don't find it particularly appealing, as every contact with
> libxl async code reduce overall happiness...
> 
> >  * There needs to be a way to deal with a broken/unresponsive driver
> >    domain.  That will involve not waiting for the backend so must
> >    involve simply deleting the backend from xenstore.
> 
> It's already there: if driver domain fails to set .../state = 6 within
> a timeout, toolstack will forcibly remove the entry.
> 
> > Is the distinction here between "xl shutdown" and "xl destroy", on the
> > actual guest domain, good enough ?  Hopefully if the driver domain
> > sees the backend directory simply vanish it can destructively tear
> > everything down ?
> 
> In the past this lead to multiple issues, where hotplug script didn't
> know which device actually was removed. In some cases I needed to
> workaround this by saving xenstore dump into a file in an "online"
> hotplug script, but it is very ugly solution.

Any opinion on the above?
In the above context (plus the fact that the toolstack use /libxl to
enumerate devices), I still think giving driver domain write access to
the backend/ node is the right solution for this problem.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.