WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Runaway real/sys time in newer paravirt domUs?

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
From: Jed Smith <jed@xxxxxxxxxx>
Date: Tue, 6 Jul 2010 12:32:31 -0400
Delivery-date: Tue, 06 Jul 2010 09:34:24 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Good morning,

We've had a few reports from domU customers[1] - confirmed by myself - that CPU
time accounting is very inaccurate in certain circumstances.  This issue seems
to be limited to x86_64 domUs, starting around the 2.6.32 family (but I can't be
sure of that).

The symptoms of the flaw include top reporting hours and days of CPU consumed by
a task which has been running for mere seconds of wall time, as well as the
time(1) utility reporting hundreds of years in some cases.  Contra-indicatively,
the /proc/stat timers on all four VCPUs increment at roughly the expected rate.
Needless to say, this is puzzling.

A test case which highlights the failure has been brought to our attention by
Ævar Arnfjörð Bjarmason, which is a simple Perl script[2] that forks and
executes numerous dig(1) processes.  At the end of his script, time(1) reports
268659840m0.951s of user and 38524003m13.072s of system time consumed.  I am
able to confirm this demonstration using:

 - Xen 3.4.1 on dom0 2.6.18.8-931-2
 - Debian Lenny on domU 2.6.32.12-x86_64-linode12 [3]

Running Ævar's test case looks like this, in that domU:

> real 0m30.741s
> user 307399002m50.773s
> sys 46724m44.192s

However, a quick busyloop in Python seems to report the correct time:

> li21-66:~# cat doit.py 
> for i in xrange(10000000):
>  a = i ** 5
>
> li21-66:~# time python doit.py
>
> real  0m16.600s
> user  0m16.593s
> sys   0m0.006s

I rebooted the domU, and the problem no longer exists.  It seems to be transient
in nature, and difficult to isolate.  /proc/stat seems to increment normally:

> li21-66:/proc# cat stat | grep "cpu " && sleep 1 && cat stat | grep "cpu "
> cpu  3742 0 1560 700180 1326 0 27 1282 0
> cpu  3742 0 1562 700983 1326 0 27 1282 0

I'm not sure where to begin with this one - any thoughts?

 [1]: http://www.linode.com/forums/viewtopic.php?p=30715
 [2]: git://gist.github.com/449825.git
 [3]: Source: http://www.linode.com/src/2.6.32.12-x86_64-linode12.tar.bz2
      Config: http://jedsmith.org/tmp/2.6.32.12-x86_64-linode12.txt

Thanks for the assistance,

Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@xxxxxxxxxx


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel