[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] bug in xenstored? No notification to subscription on @introduceDomain



On Fri, 2011-12-09 at 19:49 +0000, George Shuklin wrote:
> Good day.
> 
> I think I met some strange bug in xenstored.

If you are using XCP then this will be using oxenstored. I've CC'd
xen-api@ since that is the correct place for XCP discussions.

It's also plausibly a bug in the C client library or the python bindings
to that library (or indeed your application).

> I using XCP for long time and all that time we have some funny bug we 
> was not able to debug enough due product environment and very low chance 
> to appear, now we was able to catch it in testing environment and done 
> some research.
> 
> We have python application running in dom0 and waiting domain 
> appearance. This implemented this via subscription to @introduceDomain 
> xenstore key. Under some conditions we stops to receive notification on 
> subscription. If we ran application as second instance it will receive 
> that notification, if we restart application it will  receive too.

You lose both @introduce and @release notifications or just @introduce?

Does the app do any other XS stuff, e.g. other watches or read/write? Do
these stop working also?

oxenstored (at least in XCP) logs to /var/log/xenstore-access.log -- do
you see any activity in there? There is also /var/log/xenstored.log

Does strace show the daemon writing (or trying to write) to the socket
associated with this client? What about on the client side? (nb:
libxenstore uses a thread to handle watches so be sure to use the
appropriate options to strace.) Identifying the fd associated with the
connection on either end might be tricky, /proc/<pid>/fd and/or netstat
might help narrow it down.

The app being python presumably makes it hard to attach gdb to and get
anything sensible, likewise the daemon being ocaml. If anyone has any
hints on attaching a debugging to an existing process of these types
then that might be useful.

Other than that I'm afraid I really don't have any idea what might be
going wrong, or indeed what other next steps can be taken to diagnose
the issue :-(

Ian.

> I unable to pinpoint exact condition for this, but this
> a) Happens occasionally but consistently (about once a month in farm of 
> 50 hosts at least at one host)
> b) Not related to xenstored uptime
> c) Not related to load on xen or dom0
> d) Not related to amount of domains
> e) Occur at least at XCP 0.5, 1.0 and 1.1 (I don't know how to get 
> version from xenstored)
> 
> Last time I got that on two hosts in lab at same time (with single guest 
> domain without any high load) and done some experiments - so I can say 
> exactly I wrote above.
> 
> The pieces from python code we ran:
> 
> from xen.lowlevel.xs import xs
> conn = xs.xs()
> conn.watch("@introduceDomain", "+")
> conn.watch("@releaseDomain", "-")
> conn.read_watch()
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.