[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen system hang or freeze



On Fri, Apr 03, 2009 at 03:56:28PM +0100, Paraic Gallagher wrote:
> I am running xen 3.0.3, with CentOS 5.2 based Dom0
> (kernel-xen-2.6.18-92.1.22.el5)
> Recently I have noticed some complete system lockups on a few different
> servers. Neither Dom0 or any of the guests respond to pings, connecting a
> keyboard and monitor to the system only shows a blank screen. Nothing is
> written to logs at time of lockup.

I have seen similar issues with one of my servers. I have yet to nail
down the issue. 

Specs:
Distro: Debian Etch
Kernel: 2.6.18-6-xen-amd64
CPU: 2x Quad-Core AMD Opteron(tm) Processor 2350
Memory: 16G
Disk: 3ware 9650LE with 8 drive Raid6
Xen: 3.2 (from debian repo)

All vms are LVM backed. Not running any HVM guests.

For a while I was seeing softlockup on cpu scrolling on the console
and thought that may have caused it. Unfortunatly after updating the
kernel the errors went away and I have had another lockup since then.

Ive found a fairly set pattern though no time periods to predict.

A VM typically goes unresponsive first. If left unchecked for long
enough the host will lock. If caught in time I have had limited
success running xm destroy on the domU. Most of the time running xm
destroy on the domU causes the host to lock immediately requiring a
hard reboot.

The most recent lockup was a bit different that what I had in the
past.

The domU locked up (no output on domU console). xm destroy locked
dom0. I rebooted with a remote power strip. dom0 and all domUs came
back up. Nothing in logs as usual. 10 minutes later dom0 was locked
again. I drove to the datacenter and about 30-45 minutes after the
lock the machine became responsive again (according to monitoring
server) I was able to display a website running on a vm. Then the
machine went unresponsive again. Not responding to physical console
access either. Another hard reboot and things are ok.

That was the first time I had ever had so many lockups so close
together. Typically the lockups seem to be 1-2 weeks apart.

I have even tried setting up netconsole on dom0 to try to catch kernel
errors with no success.


-- 
Nick Anderson <nick@xxxxxxxxxxxx>
http://www.cmdln.org


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.