WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] pvops domu soft lockup under load (more logs)

On Apr 15, 2010, at 19:21 , Jeremy Fitzhardinge wrote:

> On 04/15/2010 03:56 AM, Pim van Riezen wrote:
>> On Apr 14, 2010, at 19:48 , Jeremy Fitzhardinge wrote:
>> 
>> 
>>> Does it appear on the Xen console (visible with "xm dmesg")?  You may
>>> need to do a sysrq '9' first to get it to output all messages.
>>> 
>> Does that sysrq have to go to dom0 or to the domU?
>> 
> 
> To the locked-up domU.

Ok I'm going to try and work on a reproduction scenario at our test-cluster. 
These issues were with a customer vps and his patience sort of ran out on 
experimental reboots and playing a guinea-pig.

> I've seen similar lockups at a very low rate.  The clocksource
> workaround just confuses me; the whole thing stumps me.  The main piece
> of evidence I haven't managed to get yet is a complete process dump
> (sysrq-t) to see who's waiting on what.

Another datapoint. This customer has similarly loaded VPS machines on a number 
of different hardware nodes. Not all of them had the lockup problem. I applied 
the jiffies clocksource to all his machines, regardless of their current 
problem status. After a day without lockups, the customer complained about time 
drift (ntp was not activated). The guest that had experienced the soft lockups 
earlier had major clock drift and were way ahead:

        16 Apr 09:29:26 ntpdate[11236]: step time server 194.109.22.18 offset 
-7337.731686 sec 

That's over 2 hours accumulated in less than 24 hours of uptime. The guests 
that hadn't been excperiencing the lockup issues berfore switching to the 
jiffies clocksource hadn't drifted that much after the switch and were, at 
most, 120s behind after the same amount of runtime.

> Also, could you try 2.6.32.11, which has some timer-related fixes in it,
> that may or may not help.

If I can get a reproduction scenario going I'll look into that as well.

Cheers,
Pim


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel