[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] dom0 freezing



Hello everyone,

I have an extremely annoying freeze problem with Xen that I can't get
fixed or at least debugged. It's a bit of a long story.

I ordered a x86_64 based coloserver middle of last year to run Xen and a
couple of personal domU on it. The box kept freezing all the time, I
tried a lot of things to debug it and I could not get a hold of it. The
description of this setup is in
http://thread.gmane.org/gmane.comp.emulators.xen.user/25347/focus=25500
and http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1007 .

Shortly after those mails (middle of July) after my hoster had swapped
each and every part in this box they finally replaced the previous VIA
based board for one with an AMD/ATI chipset and suddenly the box was
rock stable. During the last 10 months I did not have a single crash. It
ran with a self-compiled 3.1.0 first, was then changed to a Debian lenny
userland and hypervisor, did get a self-compiled dom0 kernel based on
Ubuntu Gutsy in January, the fresh Debian 3.2.1 hypervisor end of May.
No problems whatsoever.

A few days ago the box crashed and did not come back online, even after
issueing a hardware reset command. The IP-KVM my hoster connected showed
that the box was waiting for a keypress in BIOS saying POST was
interrupted before which might be caused by OverClocking (not in use,
definitely). When you pressed a key the box booted fine but crashed
within minutes, again dying in the BIOS. Definitely a hardware defect.
After almost all parts were replaced (CPU, RAM, power supply, fans) the
box did not crash in BIOS anymore, but suddenly started to experience
the dom0 hangs again. The software setup had not been changed since
January (the Gutsy kernel installation) and had been rebooted a couple
of times after that for maintenance, so it should definitely be fine.

I thought that maybe the board was faulty and got it changed to another
one, an nForce 560 based MSI-K9N NEO-F V3. Still, the same crashes.
Except for the harddisk the hardware has been completely replaced.

I tried changing the dom0 kernel to the Ubuntu Hardy 2.6.24-18-xen
distribution kernel, I tried numerous boot options for the Hypervisor
(noacpi, nolapic, watchdog) and the dom0 kernel (swiotlb, now trying
acpi=off and noapic). The problem is always the same, after some hours
the box freezes. There are no error messages in the log or on the
console, nothing. I still cannot send the 3*Ctrl-a to the box using the
IP-KVM so I can't tell whether dom0 or the hypervisor crashed, but I can
tell that nothing whatsoever responds anymore.

Does anyone have any idea how to debug this further? Any options I might
try to at least better understand this issue?

svr01:~# dpkg -l | grep xen
ii  libxenstore3.0                       3.2.1-1
ii  linux-image-2.6.24-18-xen            2.6.24-18.32            Linux
ii  xen-hypervisor-3.2-1-amd64           3.2.1-1                 The Xen
ii  xen-tools                            3.9-3                   Tools
ii  xen-utils-3.2-1                      3.2.1-1                 XEN
ii  xen-utils-common                     3.2.0-2                 XEN
ii  xenstore-utils                       3.2.1-1

Bernhard


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.