WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] help with xenstored 'hang'

To: Jim Fehlig <jfehlig@xxxxxxxxxx>
Subject: Re: [Xen-devel] help with xenstored 'hang'
From: Patrick Colp <pjcolp@xxxxxxxxx>
Date: Wed, 30 Jun 2010 16:17:32 -0700
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 02 Jul 2010 03:42:12 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=3AgIRyy4vyTOoYPRCXeuxuERlumIMdlKfqdMyhUvqzI=; b=FR6FppvLZHrAxm/cZy8Fm6rsD/QQWzZmo8D7ioGc5uadX0faYaFxLsMMxQRrHgVYpM bHWMro8JCSRv5Sob3AY7SeLK67vKjw6fHhTd19TsatByJa6mI9ai+nT6yk+CAXJ+qIxQ IjfQkkwrq2YJfcNAqwWPHESj6t3sUNMBKzNRQ=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=L7ae+/w6A48ZQDGvUckeFo1mibZYLRwTMefb15+Sc7oF3l66USoSpMjSHpPB1amEeo B2bQZziwK1lTYuP0JquQuDbBamAES8PzUPeamQ09oO7M/7d0AOo4H4WTL308tkbmS6Kq 1ymXPZ0/iQD95vv1tPNScHUZL4IUCG0AG3nsQ=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C2BC1FD.5050404@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C2BC1FD.5050404@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
I was recently struggling with what sounds like a not-too-dissimilar
problem while working with a disaggregated version of xenstore. The
ultimate solution for me was to disable pthreads in xenstore/libxs. I
just commented out the following line in tools/xenstore/Makefile:

xs.opic: CFLAGS += -DUSE_PTHREAD

After I removed that line and rebuilt and installed xenstore, it
worked just fine. I would be curious to know if this also solves your
problem.


Patrick


On 30 June 2010 15:15, Jim Fehlig <jfehlig@xxxxxxxxxx> wrote:
> I'm trying to debug an 'xm list' hang on a large (~700 hosts) Xen 3.2
> production installation.  The hang occurs randomly, on a random host.
> User has provided cores of xend and xenstored processes when hang
> occurs.  After poking at these cores I have discovered
>
> In xend process, a thread is blocked on a cond variable, waiting for a
> response to XS_TRANSACTION_START from xenstored. A reader thread
> responsible for reading from xenstored is blocked on read(2).
>
> In the xenstored process, the lone thread is blocked on select(2),
> waiting for IO. I examined the connections list and see that it contains
> a connection for the XS_TRANSACTION_START request.  Dumping the
> connection object:
>
> (gdb) p *(struct connection *)0x526c70
> $48 = {list = {next = 0x517c30, prev = 0x5151f0}, fd = 13, id = 0,
> can_write =
> true, in = 0x523600,
> out_list = {next = 0x526c98, prev = 0x526c98}, transaction = 0x0,
> transaction_list = {next = 0x523560,
> prev = 0x523560}, next_transaction_id = 60231445, transaction_started = 1,
> domain = 0x0, watches = {
> next = 0x51daa0, prev = 0x5267b0}, write = 0x402460 <writefd>, read =
> 0x405180 <readfd>}
>
> Notice transaction_started is set to 1, but out_list is empty. AFAICT,
> that means the reply has been sent to xend. The reader thread in xend
> should have received the response and signaled the cond variable -
> allowing execution to progress. Ultimately, xend would send a
> XS_TRANSACTION_END message, freeing the connection object in xenstored
> and removing it from connections list.
>
> Does my understanding of this code sound correct?  Anyone have
> suggestions or further debugging tips?  Examining cores is about my only
> debug option as user does not want to deploy debug patches, enable
> tracing, etc. across 700 hosts.
>
> Interestingly, when user strace's or attaches to xenstored process with
> gdb, xenstored "awakes", the hung 'xm list' returns, and xenstored
> continues normally.  A new connection to xenstored (e.g. running xmtop)
> seems to poke it along as well.  Would a timeout on select(2) in main
> loop of xenstored help at all?
>
> Thanks for any insights!
> Jim
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>