WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Xen system hang or freeze

Some thoughts:

0. Do you have the default behavior where the guests independent wallclocks are disabled?

1. I have observed visible performance differences from a VM when %steal goes above 1%.
It sounds like you have 8 cores.
How many VMs do you have?
What are their weights and caps?

2. The system default of collecting sar every ten minutes is pretty unhelpful for problem diagnosis. I routinely adjust this to interval to five seconds, which for the expense of a lot of disk space, gives a historical dataset that is useful for forensics.





On Apr 21, 2009, at 10:10 AM, Nick Anderson wrote:

On Tue, Apr 21, 2009 at 08:30:32AM -0400, Peter Booth wrote:
It would be interesting to know whether sar data was captured during
this time. From this you could track whether there was any process
creation or destruction occurring.
I just had another lockup this weekend.

Sar (from the host)
12:35:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
12:45:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
12:55:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
01:05:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
01:15:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
Average:          all      0.00      0.00      0.00      0.00
0.01     99.98

01:25:53 PM       LINUX RESTART

01:35:02 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
01:45:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
01:55:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
02:05:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99


sar -b
11:55:01 AM     12.22      0.90     11.32     12.90    257.89
12:05:01 PM     13.97      0.49     13.48      7.68    331.48
12:15:01 PM     18.88      7.30     11.59    161.74    260.17
12:25:01 PM     14.34      1.10     13.23     16.53    438.73
12:35:01 PM      9.01      0.43      8.58      6.96    208.50
12:45:01 PM      8.47      0.35      8.12      5.23    186.03
12:55:01 PM     10.00      1.09      8.91     19.22    245.17
01:05:01 PM     11.89      1.82     10.06     27.76    279.90
01:15:01 PM     10.06      0.34      9.72      5.23    214.62
Average:        17.55      6.12     11.43    385.87    369.74

01:25:53 PM       LINUX RESTART

01:35:02 PM       tps      rtps      wtps   bread/s   bwrtn/s
01:45:01 PM     19.01      7.19     11.83    113.49    273.91
01:55:01 PM     12.23      2.44      9.79     37.42    239.82
02:05:01 PM     16.89      2.79     14.10     47.93    422.02
02:15:01 PM     17.09      1.92     15.17     26.93    495.01
02:25:01 PM     13.91      3.42     10.49    164.83    282.82
02:35:01 PM     12.47      2.05     10.42     28.45    256.32
02:45:01 PM     13.67      1.81     11.87     31.78    340.39


sar -c
12:45:01 PM      0.02
12:55:01 PM      0.02
01:05:01 PM      0.02
01:15:01 PM      0.02
Average:         0.03

01:25:53 PM       LINUX RESTART

01:35:02 PM    proc/s
01:45:01 PM      0.02
01:55:01 PM      0.02

sar -q
12:55:01 PM         0       147      0.00      0.00      0.00
01:05:01 PM         0       147      0.07      0.03      0.01
01:15:01 PM         0       147      0.00      0.00      0.00
Average:            0       147      0.00      0.00      0.00

01:25:53 PM       LINUX RESTART

01:35:02 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
01:45:01 PM         0       147      0.00      0.00      0.00
01:55:01 PM         0       147      0.00      0.00      0.00

sar -r
01:05:01 PM   7312568   1878856     20.44    175416     66532
1044184         0      0.00         0
01:15:01 PM   7311948   1879476     20.45    175416     66544
1044184         0      0.00         0
Average:      7328126   1863298     20.27    175403     67011
1044184         0      0.00         0

01:25:53 PM       LINUX RESTART

01:35:02 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
kbswpfree kbswpused  %swpused  kbswpcad
01:45:01 PM   8620940    570484      6.21     64136     36012
1044184         0      0.00         0
01:55:01 PM   8619824    571600      6.22     64972     36028
1044184         0      0.00         0
02:05:01 PM   8618204    573220      6.24     65800     36040
1044184         0      0.00         0
===============================================================



Now perhaps I have missed something but to me that all looks just
fine. I should setup something to log ps. But in my guests I see steal
pushed through the roof. And its like that for days ahead time. Ive
noticed the steal during the lockups before but either I neglected to
look back several days or forgot what I saw. I didnt recall steal
being at 100% as far back as my logs go.

12:55:01 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
01:05:01 PM       all      0.00      0.00      0.00      0.00
100.00      0.00
01:15:01 PM       all      0.00      0.00      0.00      0.00
100.00      0.00
Average:          all      0.00      0.00      0.00      0.00
100.00      0.00

01:27:49 PM       LINUX RESTART

01:35:01 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
01:45:01 PM       all      4.04      0.00      1.80      0.64
0.02     93.50
01:55:01 PM       all      4.10      0.00      1.76      0.31
0.02     93.80
02:05:01 PM       all      5.45      0.00      2.47      0.23
0.02     91.83
02:15:01 PM       all      7.03      0.00      3.22      0.22
0.02     89.51
02:25:01 PM       all      4.82      0.00      2.31      0.18
0.01     92.6




Might also be worth adding a cron entry to append the output of lsof to a file every N minutes (perhaps with logrotate enabled) to see if you can capture what changed in the running system when this "lockup" occurred?
Also worth collecting ps output every minute

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

--
Nick Anderson <nick@xxxxxxxxxxxx>
http://www.cmdln.org


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users