[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?



On 07/06/2010 09:32 AM, Jed Smith wrote:
> Good morning,
>
> We've had a few reports from domU customers[1] - confirmed by myself - that 
> CPU
> time accounting is very inaccurate in certain circumstances.  This issue seems
> to be limited to x86_64 domUs, starting around the 2.6.32 family (but I can't 
> be
> sure of that).
>
> The symptoms of the flaw include top reporting hours and days of CPU consumed 
> by
> a task which has been running for mere seconds of wall time, as well as the
> time(1) utility reporting hundreds of years in some cases.  
> Contra-indicatively,
> the /proc/stat timers on all four VCPUs increment at roughly the expected 
> rate.
> Needless to say, this is puzzling.
>
> A test case which highlights the failure has been brought to our attention by
> Ævar Arnfjörð Bjarmason, which is a simple Perl script[2] that forks and
> executes numerous dig(1) processes.  At the end of his script, time(1) reports
> 268659840m0.951s of user and 38524003m13.072s of system time consumed.  I am
> able to confirm this demonstration using:
>
>  - Xen 3.4.1 on dom0 2.6.18.8-931-2
>  - Debian Lenny on domU 2.6.32.12-x86_64-linode12 [3]
>
> Running Ævar's test case looks like this, in that domU:
>
>   
>> real 0m30.741s
>> user 307399002m50.773s
>> sys 46724m44.192s
>>     
> However, a quick busyloop in Python seems to report the correct time:
>
>   
>> li21-66:~# cat doit.py 
>> for i in xrange(10000000):
>>  a = i ** 5
>>
>> li21-66:~# time python doit.py
>>
>> real 0m16.600s
>> user 0m16.593s
>> sys  0m0.006s
>>     
> I rebooted the domU, and the problem no longer exists.  It seems to be 
> transient
> in nature, and difficult to isolate.  /proc/stat seems to increment normally:
>
>   
>> li21-66:/proc# cat stat | grep "cpu " && sleep 1 && cat stat | grep "cpu "
>> cpu  3742 0 1560 700180 1326 0 27 1282 0
>> cpu  3742 0 1562 700983 1326 0 27 1282 0
>>     
> I'm not sure where to begin with this one - any thoughts?
>   

It would be helpful to identify what kernel version the change of
behaviour started in (ideally a git bisect down to a particular change,
but a pair of versions would be close enough).

I think this is the same problem as
https://bugzilla.kernel.org/show_bug.cgi?id=16314

Thanks,
    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.