This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-users] DomU hang in run state (Debian Lenny)

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] DomU hang in run state (Debian Lenny)
From: Matthew Baker <matt.baker@xxxxxxxxxxxxx>
Date: Tue, 28 Sep 2010 12:08:24 +0100
Delivery-date: Tue, 28 Sep 2010 04:10:15 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20100621 Fedora/3.0.5-1.fc13 Thunderbird/3.0.5
Hi all,

I am having what seems like an on going issue with clock syncing in xen
for quite some time now. It could be that the clock issue is resolved
and I am seeing something else but the clock issue is throwing me off
the scent.

A number of months ago I was getting "time went backwards" messages on
Xen DomUs. I tested separating the clock (independant_wallclock=1) and
running ntp in DomU and Dom0. I had bad synchronisation and the
occasional Dom0 kernel panic or just a straight lock up (no log or
terminal output).

I then moved to clocksource=jiffies and independant_wallclock=0. I have
reasonably well sync'd clocks and no Dom0 hangs but I am now seeing the
DomUs hang in a run state (seen by xm list) and CPU usage maxed out (xm
top). The DomU is not accessible via the network and the console is
unresponsive (no output after the standard boot message which may be a
week or so old). I am see no log messages in Dom0 or the DomU. I have
ran a script continuously to capture the output of ps and logging
anything using more than 30% memory or CPU time. I do not get anything
around the time of the hang. I am also monitoring via munin and that
just shows the host is dead and no creep of resource usage. However, the
machines that this happens to are reasonably busy. They mostly run
apache base web services (mixed applications), but it is not confined to
that setup.

Yesterday, I discovered the option of running clocksource=xen and
independent_wallclock=0 with the ntp.conf option "disable kernel"[1]. I
tried this last night and within a couple of hours one of my Dom0
machines hung with no output requiring a hard reset. I could not afford
any more downtime on the machines which were experiencing the outages so
have reverted to "jiffies" as that seems to be the most stable.

The whole situation is slightly left of ideal and I am at a loss as to
where to go next with this. I have left the ntp.conf option on for the
time being and I am just waiting for the next hang. Can anyone suggest a
course of action which will allow me to consider these machines stable?

Many thanks in advance for any help.



OS Debian Lenny
Xen 3.2
Linux kernel 2.6.26-2-xen-amd64
64Bit hv/kernel with a mix of 64bit and 32bit user land DomUs


 Matthew Baker, UNIX Systems Administrator
 Institute for Learning and Research Technology (ILRT)
 A: University of Bristol,
    8-10 Berkeley Square,
    BS8 1HH
 W: http://www.ilrt.bris.ac.uk/
 E: matt.baker@xxxxxxxxxx
 T: Berkeley Square
    +44 (0)117 32 14325
 T: Computer Centre
    +44 (0)117 32 17467
 F: 35BB AD51 9892 D694 7664  8BFD 2EF9 BBA4 1FDA 89C3

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Xen-users mailing list