[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] DomU hang in run state (Debian Lenny)



Hi all,

I am having what seems like an on going issue with clock syncing in xen
for quite some time now. It could be that the clock issue is resolved
and I am seeing something else but the clock issue is throwing me off
the scent.

A number of months ago I was getting "time went backwards" messages on
Xen DomUs. I tested separating the clock (independant_wallclock=1) and
running ntp in DomU and Dom0. I had bad synchronisation and the
occasional Dom0 kernel panic or just a straight lock up (no log or
terminal output).

I then moved to clocksource=jiffies and independant_wallclock=0. I have
reasonably well sync'd clocks and no Dom0 hangs but I am now seeing the
DomUs hang in a run state (seen by xm list) and CPU usage maxed out (xm
top). The DomU is not accessible via the network and the console is
unresponsive (no output after the standard boot message which may be a
week or so old). I am see no log messages in Dom0 or the DomU. I have
ran a script continuously to capture the output of ps and logging
anything using more than 30% memory or CPU time. I do not get anything
around the time of the hang. I am also monitoring via munin and that
just shows the host is dead and no creep of resource usage. However, the
machines that this happens to are reasonably busy. They mostly run
apache base web services (mixed applications), but it is not confined to
that setup.

Yesterday, I discovered the option of running clocksource=xen and
independent_wallclock=0 with the ntp.conf option "disable kernel"[1]. I
tried this last night and within a couple of hours one of my Dom0
machines hung with no output requiring a hard reset. I could not afford
any more downtime on the machines which were experiencing the outages so
have reverted to "jiffies" as that seems to be the most stable.

The whole situation is slightly left of ideal and I am at a loss as to
where to go next with this. I have left the ntp.conf option on for the
time being and I am just waiting for the next hang. Can anyone suggest a
course of action which will allow me to consider these machines stable?

Many thanks in advance for any help.

Cheers,

Matt

Versions:
OS Debian Lenny
Xen 3.2
Linux kernel 2.6.26-2-xen-amd64
64Bit hv/kernel with a mix of 64bit and 32bit user land DomUs

[1]
http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-part-4-and-last

-- 
 Matthew Baker, UNIX Systems Administrator
 -----------------------------------------------------
 Institute for Learning and Research Technology (ILRT)
 A: University of Bristol,
    8-10 Berkeley Square,
    Bristol.
    BS8 1HH
 W: http://www.ilrt.bris.ac.uk/
 E: matt.baker@xxxxxxxxxx
 T: Berkeley Square
    +44 (0)117 32 14325
 T: Computer Centre
    +44 (0)117 32 17467
 F: 35BB AD51 9892 D694 7664  8BFD 2EF9 BBA4 1FDA 89C3
 -----------------------------------------------------

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.