[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running



On Thu, 2013-05-02 at 11:10 +0100, AurÃlien Chartier wrote:
> On 02/05/13 10:24, Ian Campbell wrote:
> > On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote:
> >>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> >>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote:
> >>>> If the xenbus frontend is running in a domain running xenstored or in 
> >>>> dom0,
> >>>> the device resume is hanging because it is happening before the process
> >>>> resume. This patch adds extra logic to the resume code to check if we are
> >>>> the domain running xenstored or dom0.
> >>>>
> >>>> The frontend will be reconnected later, when the backend resumes from S3.
> >>>> This logic is working when xenstored is running in dom0, but has not been
> >>>> tested with a xenstore stub domain.
> >>>> ---
> >>>>  drivers/xen/xenbus/xenbus_probe_frontend.c |   15 ++++++++++++++-
> >>>>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c 
> >>> b/drivers/xen/xenbus/xenbus_probe_frontend.c
> >>>> index 3159a37..8583afe 100644
> >>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
> >>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
> >>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch 
> >>>> *watch,
> >>>>          xenbus_otherend_changed(watch, vec, len, 1);
> >>>>  }
> >>>>  
> >>>> +static int xenbus_frontend_dev_resume(struct device *dev)
> >>>> +{
> >>>> +        /* 
> >>>> +         * If xenstored is running in that domain, we cannot access the 
> >>>> backend
> >>>> +         * state at the moment. If we are running in dom0, the domain 
> >>>> running
> >>>> +         * xenstored is still suspended at that point
> >>>> +         */
> >>>> +        if (xen_initial_domain() || (xen_store_domain == XS_LOCAL))
> >>>> +                return 0;
> >>>> +
> >>>> +        return xenbus_dev_resume(dev);
> >>> When or where does this eventually get called for the init domain or
> >>> XS_LOCAL cases?
> >> I was about to ask the same question. Plus I don't think the
> >> description here or in the overview mail really makes clear how
> >> specifically a deadlock would occur here. That's pretty relevant to
> >> understand in the light that so far we had no indication of there
> >> being any special treatment necessary here, and resume from S3
> >> had been working quite fine without that (at least as long as
> >> xenstored is running in Dom0 and at least with the traditional/
> >> forward-port/non-pvops kernels).
> > I think the unusual feature here is that dom0 has a netfront attached.
> > Netfront resume is therefore hanging because it is trying to talk to the
> > still frozen xenstored process in dom0.
> >
> > Ian.
> >
> Yes, the unusual feature of having a netfront driver in dom0 is
> triggering the S3 issue I described. Ian made me realize this issue
> could also happen in Xenstore stub domains.
> 
> The root cause of the issue is the assomption that a xenstored process
> is running in another domain when the xenbus frontend is being resumed
> from S3. This assomption is incorrect if xenstored and the xenbus
> frontend are running in the same domain. As Linux kernel is waiting for
> all devices to be resumed before resuming userland tasks, the xenbus
> frontend resume is blocking the userland process resume, waiting for
> xenstored (which cannot run as it is a userland process).
> 
> The xenbus_dev_resume function for frontend devices such as nefront will
> not be called at all with that patch. I am relying on the fact that the
> network backend domain will be resumed after dom0 resume is complete.
> When that resume is happening, it will trigger a call to netback_changed
> in dom0 netfront. This call will end up resuming xenbus states in netfront.
> 
> That logic is working for a dom0 netfront, as we can safely rely on the
> fact that the network backend domain will be resumed after dom0 resume
> is complete. I don't have a Xen configuration with Xenstore stub domain,
> but it would probably need some extra logic to reconnect the frontend
> after xenstored is being resumed. The main goal of this patch is to fix
> the S3 resume of domains running both a xenbus frontend and xenstored.

Is the assumption that other domains are all suspended over S3 a valid
one in the general case?

In principal there is nothing stopping the toolstack from leaving
domains running over S3, is there?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.