[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen system hang or freeze



On Tue, Apr 21, 2009 at 08:30:32AM -0400, Peter Booth wrote:
> It would be interesting to know whether sar data was captured during  
> this time. From this you could track whether there was any process  
> creation or destruction occurring.
I just had another lockup this weekend.

Sar (from the host)
12:35:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
12:45:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
12:55:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
01:05:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
01:15:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
Average:          all      0.00      0.00      0.00      0.00
0.01     99.98

01:25:53 PM       LINUX RESTART

01:35:02 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
01:45:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
01:55:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99
02:05:01 PM       all      0.00      0.00      0.00      0.00
0.01     99.99


sar -b
11:55:01 AM     12.22      0.90     11.32     12.90    257.89
12:05:01 PM     13.97      0.49     13.48      7.68    331.48
12:15:01 PM     18.88      7.30     11.59    161.74    260.17
12:25:01 PM     14.34      1.10     13.23     16.53    438.73
12:35:01 PM      9.01      0.43      8.58      6.96    208.50
12:45:01 PM      8.47      0.35      8.12      5.23    186.03
12:55:01 PM     10.00      1.09      8.91     19.22    245.17
01:05:01 PM     11.89      1.82     10.06     27.76    279.90
01:15:01 PM     10.06      0.34      9.72      5.23    214.62
Average:        17.55      6.12     11.43    385.87    369.74

01:25:53 PM       LINUX RESTART

01:35:02 PM       tps      rtps      wtps   bread/s   bwrtn/s
01:45:01 PM     19.01      7.19     11.83    113.49    273.91
01:55:01 PM     12.23      2.44      9.79     37.42    239.82
02:05:01 PM     16.89      2.79     14.10     47.93    422.02
02:15:01 PM     17.09      1.92     15.17     26.93    495.01
02:25:01 PM     13.91      3.42     10.49    164.83    282.82
02:35:01 PM     12.47      2.05     10.42     28.45    256.32
02:45:01 PM     13.67      1.81     11.87     31.78    340.39


sar -c
12:45:01 PM      0.02
12:55:01 PM      0.02
01:05:01 PM      0.02
01:15:01 PM      0.02
Average:         0.03

01:25:53 PM       LINUX RESTART

01:35:02 PM    proc/s
01:45:01 PM      0.02
01:55:01 PM      0.02

sar -q 
12:55:01 PM         0       147      0.00      0.00      0.00
01:05:01 PM         0       147      0.07      0.03      0.01
01:15:01 PM         0       147      0.00      0.00      0.00
Average:            0       147      0.00      0.00      0.00

01:25:53 PM       LINUX RESTART

01:35:02 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
01:45:01 PM         0       147      0.00      0.00      0.00
01:55:01 PM         0       147      0.00      0.00      0.00

sar -r
01:05:01 PM   7312568   1878856     20.44    175416     66532
1044184         0      0.00         0
01:15:01 PM   7311948   1879476     20.45    175416     66544
1044184         0      0.00         0
Average:      7328126   1863298     20.27    175403     67011
1044184         0      0.00         0

01:25:53 PM       LINUX RESTART

01:35:02 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
kbswpfree kbswpused  %swpused  kbswpcad
01:45:01 PM   8620940    570484      6.21     64136     36012
1044184         0      0.00         0
01:55:01 PM   8619824    571600      6.22     64972     36028
1044184         0      0.00         0
02:05:01 PM   8618204    573220      6.24     65800     36040
1044184         0      0.00         0
===============================================================



Now perhaps I have missed something but to me that all looks just
fine. I should setup something to log ps. But in my guests I see steal
pushed through the roof. And its like that for days ahead time. Ive
noticed the steal during the lockups before but either I neglected to
look back several days or forgot what I saw. I didnt recall steal
being at 100% as far back as my logs go.

12:55:01 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
01:05:01 PM       all      0.00      0.00      0.00      0.00
100.00      0.00
01:15:01 PM       all      0.00      0.00      0.00      0.00
100.00      0.00
Average:          all      0.00      0.00      0.00      0.00
100.00      0.00

01:27:49 PM       LINUX RESTART

01:35:01 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
01:45:01 PM       all      4.04      0.00      1.80      0.64
0.02     93.50
01:55:01 PM       all      4.10      0.00      1.76      0.31
0.02     93.80
02:05:01 PM       all      5.45      0.00      2.47      0.23
0.02     91.83
02:15:01 PM       all      7.03      0.00      3.22      0.22
0.02     89.51
02:25:01 PM       all      4.82      0.00      2.31      0.18
0.01     92.6




> Might also be worth adding a cron entry to append the output of lsof to a 
> file every N minutes (perhaps with logrotate enabled) to see if you can 
> capture what changed in the running system when this "lockup" occurred?
> Also worth collecting ps output every minute

> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users

-- 
Nick Anderson <nick@xxxxxxxxxxxx>
http://www.cmdln.org


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.