[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] libxl: error handling before xenstored runs



On Thursday 10 February 2011 12:43:47 Ian Campbell wrote:
> On Thu, 2011-02-10 at 11:32 +0000, Christoph Egger wrote:
> > On Thursday 10 February 2011 12:24:41 Ian Campbell wrote:
> > > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote:
> > > > On 10/02/11 08:55, Ian Campbell wrote:
> > > > > That's the underlying bug which the heuristic is trying to avoid...
> > > > >
> > > > > Fundamentally the xs ring protocol is missing any way to tell if
> > > > > someone is listening on the other end so you have no choice but to
> > > > > try communicating and see if anyone responds.
> > > > >
> > > > > It's a pretty straightforward bug that the kernel does the waiting
> > > > > to see if anyone responds bit with an uninterruptible sleep. I took
> > > > > a quick look a little while ago but unfortunately it didn't look
> > > > > straightforward to fix on the kernel side :-( I can't remember why
> > > > > though.
> > > >
> > > > For starter, the protocol requires the messages to sit on the ring
> > > > for a underdetermined amount of time (boot watches).
> > > >
> > > > > It might be simpler to support allowing the userspace client to
> > > > > explicitly specify a timeout. I'm not sure what the impact on the
> > > > > ring is of leaving unconsumed requests on the ring when the other
> > > > > end does show up. Presumably the kernel driver just needs to be
> > > > > prepared to swallow responses whose target has given up and gone
> > > > > home.
> > > >
> > > > No, the simplest thing to do is to use the socket connection
> > > > exclusively. Just how we're doing it in XCP and XCI.
> > >
> > > Right but this approach doesn't work with xenstored in a stubdomain.
> > > Part of the point of using the ring protocol even when this isn't the
> > > case is to help ensure that it is possible and help avoid regressions
> > > etc.
> > >
> > > > The protocol is not design to do async either, so leaving unconsumed
> > > > request, could be pretty disastrous if the other end show up.
> > > > Providing the kernel doesn't detect it (i don't think it does [1]),
> > > > it would imply spurious reply, for example the previous waiting read
> > > > on "/abc/def" could reply to a next read on "/xyz/123".
> > >
> > > The wire protocol includes a req_id which is echoed in the response
> > > which sh/could facilitate multiplexing this sort of thing. The pvops
> > > kernel currently always sets it to zero but that's just an
> > > implementation detail ;-) Currently the kernel does (roughly):
> > >   take_lock
> > >   write_request
> > >   wait_for_reply
> > >   release_lock
> > > instead it should/could be:
> > >   take_lock(timeout)
> > >   write_request (++req_id)
> > >   while read_reply.req_id != req_id && not (timeout)
> > >           wait some more
> > >   release lock
> >
> > I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0.
>
> Fixing the NetBSD dom0 does though.
>
> Seriously, if kernels are lacking in functionality needed to make the
> system work smoothly and correctly we should fix them, not just default
> to adding hacks in userspace because it seems easier in the short term.
> (Obviously if the userspace solution is the right thing to do and/or
> more correct in its own right then fine lets do that).

Does xl communicate with xenstored through a named socket ?
If yes then 'connect()' should check for ECONNREFUSED.

Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.