[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen system hang or freeze


  • To: Martin Fernau <m.fernau@xxxxxxxxxx>, xen-users@xxxxxxxxxxxxxxxxxxx
  • From: Paraic Gallagher <paraic.gallagher@xxxxxxxxx>
  • Date: Mon, 6 Apr 2009 11:56:01 +0100
  • Cc:
  • Delivery-date: Mon, 06 Apr 2009 03:56:49 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ZYQVzDImSZIBBJ4vRRtWY1qfh3fUR4uPGtU0a7IPJysbDXFQyRvsziTrVpTgcmxvfW 7BWPAKt8rxpOqSkqC/cQt3J8bQOoIF4o8m2awFD/0IHkZDtJ+KB0iPqjNB+GmjJ1FBqH NCxAgcZoX/0/CoRwbLHAODlk0j4OhHe4EeQDo=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

This problem occurred again this weekend on one of my servers. No response to input or pings to any domains - just a blank screen when keyboard and monitor connected. It had been running for around 1 week. There was no load running on the system, CentOS 5.2 Dom0 and one CentOS 5.2 domU, and one RHEL 4.1 domU. There were no errors written to syslog around the time of the lockup.

It is a Dell PE 1950 and I had the console redirected to Serial Over Lan. I had sysrc enabled on the system and attempted to get some further debugging information using these keys. However the system did not respond. I hit Ctrl-A to switch the input to Xen and get this screen and triggered a crash dump.

(XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0).
(XEN) 'h' pressed -> showing installed handlers
(XEN)  key '%' (ascii '25') => Trap to xendbg
(XEN)  key 'C' (ascii '43') => trigger a crashdump
(XEN)  key 'H' (ascii '48') => dump heap info
(XEN)  key 'N' (ascii '4e') => NMI statistics
(XEN)  key 'R' (ascii '52') => reboot machine
(XEN)  key 'a' (ascii '61') => dump timer queues
(XEN)  key 'd' (ascii '64') => dump registers
(XEN)  key 'h' (ascii '68') => show this message
(XEN)  key 'i' (ascii '69') => dump interrupt bindings
(XEN)  key 'm' (ascii '6d') => memory info
(XEN)  key 'n' (ascii '6e') => trigger an NMI
(XEN)  key 'q' (ascii '71') => dump domain (and guest debug) info
(XEN)  key 'r' (ascii '72') => dump run queues
(XEN)  key 't' (ascii '74') => display multi-cpu clock info
(XEN)  key 'u' (ascii '75') => dump numa info
(XEN)  key 'z' (ascii '7a') => print ioapic info

Does this mean the hypervisor is still active but all guests, including Dom0 are hosed?

Is there something of value to look for in the Xen menu?

From this thread three people have reported repeated system lockups, on various
hardware, with no real warning or logging information, and no solution other than a hard
reset of the system.

Is anyone aware of a bug id for this problem or should a bug be raised?
Is there some other information I can provide from my setup which would be useful to diagnose the problem?

regards,
Paraic.


2009/4/6 Martin Fernau <m.fernau@xxxxxxxxxx>
With "stock xen 2.6.18.8 kernel" you mean the original Kernel from
"http://www.xen.org/download/" ? I currently use the xen-kernel 2.6.18-r12
from my distro. So I could give it a try...

How dit you get notice of these kernel oops and/or soft IRQ lockups? I'm not
able to discover _any_ abnormal events on my system as all logfiles are clean.
There must be a way to debug this...

The only USB device I currently have attached to my dom0 is a Smart-UPS
System. I don't know if this really could kill the whole machine as the
communication between dom0 and this ups should be very very low.

We must find a way to discover these lockups. Are there any debug-log
functionality we could enable in xen to start to discover this problem?

I'm afraid that these lockups could become a ko criteria for xen in the future
for professional servers...

Am Sonntag, 5. April 2009 22:33:36 schrieb thomas morgan:
> Over the last year, I've experienced a couple of sources of lockups.
>
> The first was resolved by going to the stock xen 2.6.18.8 kernel
> compiled from source (had been using the Debian etch kernel; found
> commentary online describing the same symptoms on Ubuntu, Redhat, and
> CentOS though, each with their distro-specific kernel).
>
> This one tended to result in kernel oops messages--soft IRQ lockups as
> I recall. Lockup would start with a domU and within a few minutes
> would kill the dom0 too. The fastest way to trigger this one was to
> create and shutdown domU's, although I don't recall that being the
> only way.
>
> The second, with the stock kernel, was an errant USB hub attached to a
> xen host. Removing the hub resolved the issue. These were complete,
> sudden lockups of the dom0 and all domUs -- basically everything.
> Higher traffic over the USB port would trigger this lockup.
>
> So, for those who haven't tried the stock xen kernel, and are able to
> try it (based on driver support, etc.), it might help.
>
> --t


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.