This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-users] DomU hang in run state (Debian Lenny)

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] DomU hang in run state (Debian Lenny)
From: Matt Baker <m@xxxxxxxxxxxx>
Date: Mon, 27 Sep 2010 23:26:51 +0100
Delivery-date: Wed, 29 Sep 2010 07:52:10 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20100915 Thunderbird/3.0.8
Hi all,

 We have a number of Xen nodes used in a bunch of Ganeti clusters
running on Debian Lenny. Most are 64bit kernels with a mix of 32/64bit
user land VMs. Where we have a paravirtualised Lenny DomU we are
experiencing a hang at seemingly random occasions. When inspecting the
hypervisor it states that the DomU is in a run state (with xm list) and
(with xm top) the CPUs are all maxed out. I am not able to get into the
DomU either over the network or via a console. Sometimes I get output to
the console but there is no information since the standard boot messages
which were usually printed there from a week or so ago so not relevant.

I do not have any information in the Hypervisors xen logs or kernel logs
and similarly in the DomU kernel logs. I have ran a script in the DomU capturing the output of ps every 10 seconds and alerting to processes which are using more than 30% memory or cpu. Neither of these show any output at the time of the hang. I am also monitoring all DomUs via munin which is also not recording a gradual creep in resource usage.

I have had a problem with the "time went backwards" issue and have
attempted to fix the problem as shown on the Xen FAQ by setting the clock source to "jiffies". This was the most successful as it stopped time messages, but still exhibited the hang problem above. Before, I was experiencing kernel panics with the default clocksource of "xen" and independant_wallclock=0. I have also tried setting "disable kernel" in ntp.conf (with clocksource=xen and independent_wallclock=0) which has appeared recently as an option, but unfortunately I am back to the original problem of the physical host hanging needing a hard reset.

I am considering an attempt to move these hosts to a newer version of Xen if there's a possibility it will be more stable. Current version is standard for Lenny, xen = 3.2, kernel 2.6.26.

Any assistance or advice on this would be greatly appreciated.

Many thanks,


 Matthew Baker, UNIX Systems Administrator
 Institute for Learning and Research Technology (ILRT)
 A: University of Bristol,
    8-10 Berkeley Square,
    BS8 1HH
 W: http://www.ilrt.bris.ac.uk/
 E: matt.baker@xxxxxxxxxx
 T: Berkeley Square
    +44 (0)117 33 14325
 T: Computer Centre
    +44 (0)117 33 17467
 F: 35BB AD51 9892 D694 7664  8BFD 2EF9 BBA4 1FDA 89C3

Xen-users mailing list