This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Re: blocking Xen 3.X production use: soft lockup bugs

To: "Steve Traugott" <stevegt@xxxxxxxxxxxxx>
Subject: RE: [Xen-devel] Re: blocking Xen 3.X production use: soft lockup bugs
From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date: Thu, 3 Aug 2006 01:59:20 +0100
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 02 Aug 2006 18:00:02 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Aca2lbTbedAZ8lLsTYGL0oS0m9jrUQAATaNw
Thread-topic: [Xen-devel] Re: blocking Xen 3.X production use: soft lockup bugs
> > The soft lockup messages appear to be benign in that the domain
seems to
> > be continuing quite happily after printing them -- its quite
> > that the system was sufficiently busy that the domain VCPU just
> > get scheduled for a while, triggering the warning message. Are you
> > they're actually related to the more serious problem you're
> > experiencing?
> I can't prove that the network-related soft lockups I'm seeing on the
> x330's are the same soft lockups related to filesystem damage we saw
> on the Netengines -- we stopped using Netengines for Xen 3 when we hit
> that (they run Xen 2 fine).  Now that I know what to look for, I'll go
> back and re-create the Xen 3 environment on the Netengines so I can
> reproduce the problem there.

Do you get anything of interest in dom0's dmesg?
The fact that dom0 is unresponsive for some seconds is interesting. What
does your dom0 use as a root filesystem?

Is your dom0 uni proc or smp?

When its in the stalled state, if you have a serial console, switching
to xen's debug console (ctrl-a three times) and hitting 'd' and 'q' a
few times might be useful. You'll need to lookup all the EIPs into
symbols by hand. (This is easier if you're running the same kernel in
all domains)

> > Have you tried using -unstable and hence xen's new scheduler? This
> > less likely to provoke soft lockup false alarms.
> Haven't tried unstable yet, since this is for the production
> infrastructure for my family's business; am in the process of
> rebuilding with testing changeset 9762 though.  (is that really tip?
> hg log says Jun 29th for that changeset, even after a pull...)

There have been no requests to back port patches since then. 

If you can, its really worth trying -unstable. Any changeset from over
last weekend should be just fine.


Xen-devel mailing list