[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: blocking Xen 3.X production use: soft lockup bugs


  • To: "Steve Traugott" <stevegt@xxxxxxxxxxxxx>
  • From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
  • Date: Thu, 3 Aug 2006 01:59:20 +0100
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 02 Aug 2006 18:00:02 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: Aca2lbTbedAZ8lLsTYGL0oS0m9jrUQAATaNw
  • Thread-topic: [Xen-devel] Re: blocking Xen 3.X production use: soft lockup bugs

> > The soft lockup messages appear to be benign in that the domain
seems to
> > be continuing quite happily after printing them -- its quite
possible
> > that the system was sufficiently busy that the domain VCPU just
didn't
> > get scheduled for a while, triggering the warning message. Are you
sure
> > they're actually related to the more serious problem you're
> > experiencing?
> 
> I can't prove that the network-related soft lockups I'm seeing on the
> x330's are the same soft lockups related to filesystem damage we saw
> on the Netengines -- we stopped using Netengines for Xen 3 when we hit
> that (they run Xen 2 fine).  Now that I know what to look for, I'll go
> back and re-create the Xen 3 environment on the Netengines so I can
> reproduce the problem there.

Do you get anything of interest in dom0's dmesg?
The fact that dom0 is unresponsive for some seconds is interesting. What
does your dom0 use as a root filesystem?

Is your dom0 uni proc or smp?

When its in the stalled state, if you have a serial console, switching
to xen's debug console (ctrl-a three times) and hitting 'd' and 'q' a
few times might be useful. You'll need to lookup all the EIPs into
symbols by hand. (This is easier if you're running the same kernel in
all domains)

> > Have you tried using -unstable and hence xen's new scheduler? This
is
> > less likely to provoke soft lockup false alarms.
> 
> Haven't tried unstable yet, since this is for the production
> infrastructure for my family's business; am in the process of
> rebuilding with testing changeset 9762 though.  (is that really tip?
> hg log says Jun 29th for that changeset, even after a pull...)

There have been no requests to back port patches since then. 

If you can, its really worth trying -unstable. Any changeset from over
last weekend should be just fine.

Ian





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.