Hi all,

I am having what seems like an on going issue with clock syncing in xen
for quite some time now. It could be that the clock issue is resolved
and I am seeing something else but the clock issue is throwing me off
the scent.

A number of months ago I was getting "time went backwards" messages on
Xen DomUs. I tested separating the clock (independant_wallclock=1) and
running ntp in DomU and Dom0. I had bad synchronisation and the
occasional Dom0 kernel panic or just a straight lock up (no log or
terminal output).

I then moved to clocksource=jiffies and independant_wallclock=0. I have
reasonably well sync'd clocks and no Dom0 hangs but I am now seeing the
DomUs hang in a run state (seen by xm list) and CPU usage maxed out (xm
top). The DomU is not accessible via the network and the console is
unresponsive (no output after the standard boot message which may be a
week or so old). I am see no log messages in Dom0 or the DomU. I have
ran a script continuously to capture the output of ps and logging
anything using more than 30% memory or CPU time. I do not get anything
around the time of the hang. I am also monitoring via munin and that
just shows the host is dead and no creep of resource usage. However, the
machines that this happens to are reasonably busy. They mostly run
apache base web services (mixed applications), but it is not confined to
that setup.

Yesterday, I discovered the option of running clocksource=xen and
independent_wallclock=0 with the ntp.conf option "disable kernel"[1]. I
tried this last night and within a couple of hours one of my Dom0
machines hung with no output requiring a hard reset. I could not afford
any more downtime on the machines which were experiencing the outages so
have reverted to "jiffies" as that seems to be the most stable.

The whole situation is slightly left of ideal and I am at a loss as to
where to go next with this. I have left the ntp.conf option on for the
time being and I am just waiting for the next hang. Can anyone suggest a
course of action which will allow me to consider these machines stable?

Many thanks in advance for any help.



OS Debian Lenny
Xen 3.2
Linux kernel 2.6.26-2-xen-amd64
64Bit hv/kernel with a mix of 64bit and 32bit user land DomUs


