[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] help with xenstored 'hang'


  • To: Jim Fehlig <jfehlig@xxxxxxxxxx>
  • From: Patrick Colp <pjcolp@xxxxxxxxx>
  • Date: Wed, 30 Jun 2010 16:17:32 -0700
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 02 Jul 2010 03:42:12 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=L7ae+/w6A48ZQDGvUckeFo1mibZYLRwTMefb15+Sc7oF3l66USoSpMjSHpPB1amEeo B2bQZziwK1lTYuP0JquQuDbBamAES8PzUPeamQ09oO7M/7d0AOo4H4WTL308tkbmS6Kq 1ymXPZ0/iQD95vv1tPNScHUZL4IUCG0AG3nsQ=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

I was recently struggling with what sounds like a not-too-dissimilar
problem while working with a disaggregated version of xenstore. The
ultimate solution for me was to disable pthreads in xenstore/libxs. I
just commented out the following line in tools/xenstore/Makefile:

xs.opic: CFLAGS += -DUSE_PTHREAD

After I removed that line and rebuilt and installed xenstore, it
worked just fine. I would be curious to know if this also solves your
problem.


Patrick


On 30 June 2010 15:15, Jim Fehlig <jfehlig@xxxxxxxxxx> wrote:
> I'm trying to debug an 'xm list' hang on a large (~700 hosts) Xen 3.2
> production installation. ÂThe hang occurs randomly, on a random host.
> User has provided cores of xend and xenstored processes when hang
> occurs. ÂAfter poking at these cores I have discovered
>
> In xend process, a thread is blocked on a cond variable, waiting for a
> response to XS_TRANSACTION_START from xenstored. A reader thread
> responsible for reading from xenstored is blocked on read(2).
>
> In the xenstored process, the lone thread is blocked on select(2),
> waiting for IO. I examined the connections list and see that it contains
> a connection for the XS_TRANSACTION_START request. ÂDumping the
> connection object:
>
> (gdb) p *(struct connection *)0x526c70
> $48 = {list = {next = 0x517c30, prev = 0x5151f0}, fd = 13, id = 0,
> can_write =
> true, in = 0x523600,
> out_list = {next = 0x526c98, prev = 0x526c98}, transaction = 0x0,
> transaction_list = {next = 0x523560,
> prev = 0x523560}, next_transaction_id = 60231445, transaction_started = 1,
> domain = 0x0, watches = {
> next = 0x51daa0, prev = 0x5267b0}, write = 0x402460 <writefd>, read =
> 0x405180 <readfd>}
>
> Notice transaction_started is set to 1, but out_list is empty. AFAICT,
> that means the reply has been sent to xend. The reader thread in xend
> should have received the response and signaled the cond variable -
> allowing execution to progress. Ultimately, xend would send a
> XS_TRANSACTION_END message, freeing the connection object in xenstored
> and removing it from connections list.
>
> Does my understanding of this code sound correct? ÂAnyone have
> suggestions or further debugging tips? ÂExamining cores is about my only
> debug option as user does not want to deploy debug patches, enable
> tracing, etc. across 700 hosts.
>
> Interestingly, when user strace's or attaches to xenstored process with
> gdb, xenstored "awakes", the hung 'xm list' returns, and xenstored
> continues normally. ÂA new connection to xenstored (e.g. running xmtop)
> seems to poke it along as well. ÂWould a timeout on select(2) in main
> loop of xenstored help at all?
>
> Thanks for any insights!
> Jim
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.