WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] PV Linux domUs freeze after a few hours

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] PV Linux domUs freeze after a few hours
From: Leszek Urbanski <tygrys@xxxxxx>
Date: Mon, 31 Aug 2009 14:43:58 +0200
Delivery-date: Mon, 31 Aug 2009 05:44:51 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.4.2.3i
Hi,

I'm experiencing random domU freezes.

This is similar to Debian bug #534880. One of the numerous references to it
on the web:
http://moblog.wiredwings.com/archives/20090227/Lennys-Xen-Kernel-2.6.26-Causes-DomU-Freezes.html
It is stated there that both 2.6.26 and 2.6.30 with pv_ops freeze on domU.

I've been using Xen 3.2 for a few months without any problems - until now.

This seems like a critical bug that bites more and more installations and
will surely become a show stopper for migration to Xen at many Linux shops.

My environment:

- Xen 3.2.1

- multiple x86_64 and i386 dom0s and domUs

- machines from different vendors, the hardware has been checked and
  double-checked

- dom0 kernel: Debian Lenny's xenified 2.6.26 (with OpenSUSE patches)

- domU kernels: paravirt ops 2.6.30 and Lenny's xenified 2.6.26 (both have
  the same problems)

- all domUs are SMP (vcpus > 1). This problem doesn't occur with UP domUs,
  (unfortunately the performance hit from running the domUs with vcpus=1 is
  unacceptable for my installations)

- no vcpu pinning (by choice) for dom0s nor domUs

- the bug seems unrelated to load profiles; some domUs that freeze are almost
  always idle, some are I/O intensive, pushing 30 MB/s to disks and a few
  hundred megabits to the network.


The symptoms:

After a few (3-24) hours of runtime, some of the domUs become completely
unresponsive:

- the network stack is completely dead

- xm console is unresponsive

- xm vcpu-list always shows one vcpu in no state ("---") and all other vcpus
  in r state

- xm destroy works and immediately destroys the domU

- nothing useful in xm dmesg, xm log

- mpstat shows less than 10% steal

- I'm waiting for another freeze to check if there's anything useful on
  domU consoles


I'll try the following options (and post my results to this list):

- vcpu-pinning for dom0 only

- vcpu-pinning for dom0 and domUs

- vcpu-pinning and dedicating a core for the dom0

(however, vcpu-pinning is not a solution for me, as it wastes cores - some
domUs sit idle and some wait for their turn)

- downgrading dom0 kernel to xenified 2.6.18

- upgrading the hypervisor to 3.4

- downgrading domU kernel to xenified 2.6.18


-- 
Leszek "Tygrys" Urbanski, SCSA, SCNA
 "Unix-to-Unix Copy Program;" said PDP-1. "You will never find a more
  wretched hive of bugs and flamers. We must be cautious." -- DECWARS
     http://cygnus.moo.pl/ -- Cygnus High Altitude Balloon

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>