WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: Known console(d) bug?

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Re: Known console(d) bug?
From: Ferenc Wagner <wferi@xxxxxxx>
Date: Sat, 30 May 2009 01:06:38 +0200
Delivery-date: Fri, 29 May 2009 16:07:05 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20090529215301.GU24960@xxxxxxxxxxxxxxx> (Pasi Kärkkäinen's message of "Sat, 30 May 2009 00:53:01 +0300")
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <87eiu74vfq.fsf@xxxxxxxxxxxxx> <20090529215301.GU24960@xxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)
Pasi Kärkkäinen <pasik@xxxxxx> writes:

> On Fri, May 29, 2009 at 08:26:33PM +0200, Ferenc Wagner wrote:
> 
>> There's a problem I'm struggling with for quite some time in our Xen
>> hosting environment.  Basically, after a couple of months' smooth
>> running time, suddenly most virtual machines get stuck into r state
>> and stop responding to anything, including xm console and xm sysrq.
>> It happens rather regularly, but I can't reproduce it by taxing the
>> domUs or the dom0 with disk I/O, CPU or console I/O.
>> 
>> However, a couple of days ago it turned out that this situation can be
>> cured by restarting xenconsoled!  After that, xm console spit out the
>> previous random typing, sysrq help strings and whatnot for the domUs
>> which weren't stuck in r state, and the stuck ones also started to
>> respond and run normally (spending most of their time in b state) again.
>> 
>> The whole phenomenon looked like xenconsoled stopped emptying the domU
>> console buffers, and those domUs which were constantly writing to
>> their consoles quickly filled it up and started busy-looping trying to
>> put more characters onto their consoles, not caring to respond to
>> ping, even.  But those domUs which didn't write to their consoles,
>> stayed functional until the desperate operator forced them to create
>> enough console output to fill up their buffers as well, and then they
>> stuck into r state just like the others.  After restarting xenconsoled
>> all were able to recover successfully.
>> 
>> Of course the above is just guessing, I don't know the details of Xen
>> console handling.  But I wonder if it rings any bells here, or maybe
>> this issue is known and fixed already.  Oh, I experience this under
>> Xen 3.2 and pv-ops guests (2.6.26+patches).
>
> I've seen the exact same bug/problem with Xen in RHEL5/CentOS (5.0, 5.1, 
> 5.2). 
> I believe it's also in 5.3. 
>
> I reported the problem to xen-devel, but I couldn't provide the needed
> strace/backtrace to figure out the reason _why_ that happens.. (I had
> already restarted xenconsoled..)
>
> I think developers would need more information to figure out what the
> actual bug is. 

Indeed I found your report now.  This means you're running for almost
a year without experiencing this!  I get it much more often, but still
pretty rarely.  I also noticed that the more or less regular

WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too 
long to execute: 200 ms (> 50 ms) (GSource: 0x811bf80)

messages from heartbeat came 50 times more often while xenstored was
stuck (it didn't take any significant CPU at least).  However, four
domUs in constantly r state surely sucked up all the CPU power of the
4-way host machine.

And this phenomenon is always triggered by some extra load, typically
by tiger starting an md5sum check of the installed packages at the
same time on a couple of domUs.  (Btw. doesn't some randomized crond
exist for helping this in general?)
-- 
Cheers,
Feri.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>