WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] CentOS domU hangs on "Restarting system" - didn't you have t

To: <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-users] CentOS domU hangs on "Restarting system" - didn't you have that one, too?
From: Florian Heigl <fh@xxxxxxxxxxxxxxxxxx>
Date: Thu, 29 Sep 2011 18:37:41 +0200
Delivery-date: Fri, 30 Sep 2011 04:35:33 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Organization: Mathias Kettner GmbH
Reply-to: fh@xxxxxxxxxxxxxxxxxx
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: RoundCube Webmail/0.3.1
Hi,

I'm still trying to pin down one of the last issues on some systems here.

I'm interested for input from people who *recognize* the following:

Sending all processes the TERM signal...     [  OK  ]
Sending all processes the KILL signal...     [  OK  ]
Saving random seed:                          [  OK  ]
Syncing hardware clock to system time        [FAILED]
Turning off swap:                            [  OK  ] 
Unmounting file systems:                     [  OK  ] 
Please stand by while rebooting the system. 
Restarting the system.                      
  \
   \_______ this is a lie, no restart ever happens.


This error will occur sometimes, not always.
It reliably goes away upon a XenD restart.

Setup:
======
OS: CentOS 5.4 / 32bit / Xen 3 (outdatedness grade indicator:
.1.2-164.15.1.el5)

All guests (around 80) & hosts (10ish) run the same release, but I also
have done a test with one host running the latetest and greatest Xen
version from CentOS 5.7


Things that I tried to blame so far:
------------------------------------
= Old Xen version (switching to less old one didn't help)
qemu VFB due to
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=718620

= the event channel issue where the dom0 and domU are using different
vcpus while talking to each other
http://lists.xensource.com/archives/html/xen-devel/2009-01/msg00004.html
this could possibly be sorted with a nightmarish hack that maps all vcpus
onto one cpu on shutdown time by sshing into dom0. One would have to ensure
the mapping is OK again after a reboot.
err. you can imagine how much I "like" this idea.

= domU kernel: yet untested, I hardly have any chance of updating it,
rather would need to backport the fix (if there was one) to 5.4

I found some posts by people that didn't get the error any more after
moving to something newer than CentOS 5.2 but this doesn't seem to have
completely done away with it.

So far I failed to make this issue 100% reproducible. It will show up
minutes after freshly installing a Xen host, or it will not show up for a
week on another one. It may affect all VMs on a host, or it may affect only
one.

You can work around it by using 
xm destroy plus 
killing of any stuck qemu vfb processes (which is one of the reasons for
pointing at the VFB)
service xend restart

xm create vm

but the xend restart introduces other issues, i.e. that any unaffected VM
that is rebooted during the restart will be gone with the winds, or the
fact that you'd have to have a magic way that detects a stuck VM and
triggers the restart. Also I don't feel quite sure that a few 100 xend
restarts would do no harm over time...

The low chance of reproducing the issue is one of the big problems with
it[*], so if you remember that issue and did any successful troubleshooting
for it (or fixed it...) let me know.


Thanks :)
Florian




[*]Let alone systems that won't even make a reboot and what it makes me
think about the QA
-- 
Mathias Kettner GmbH  |  \/  | |/ /   M A T H I A S   K E T T N E R
Florian Heigl         | |\/| | ' /
Steinstr. 44          | |  | | . \        Linux Beratung & Schulung
81667 München         |_|  |_|_|\_\       http://mathias-kettner.de
Tel.: 089 / 1890 4210 
Fax.: 089 / 1890 4211 Mail:  fh@xxxxxxxxxxxxxxxxxx

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] CentOS domU hangs on "Restarting system" - didn't you have that one, too?, Florian Heigl <=