[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Detecting deadlocks with hypervisor..

To: xen-devel@xxxxxxxxxxxxxxxxxxx
From: "T S" <thileepan_@xxxxxxxxxxx>
Date: Fri, 24 Mar 2006 10:57:58 -0800
Bcc:
Delivery-date: Fri, 24 Mar 2006 18:59:32 +0000
List-id: Xen developer discussion <xen-devel.lists.xensource.com>

From: Ewan Mellor
To: Thileepan Subramaniam CC: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
Date: Sun, 19 Mar 2006 13:17:35 +0000

On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote:

> Hello,
>
> I am trying to see if the hypervisor can be used to detect deadlocks inthe> guest VMs. My goal is to detect if a guest OS is deadlocked, and if itis,> then create a clone of the deadlocked OS without the locking condition,and
> letting the clone run. While the clone runs I am hoping to generate some
> hints that could tell me what caused the deadlock.
>
> I simulated a deadlock/hang situation in a guest OS (by loading a badly
> written module to the kernel) and when the guestOS kernel was hanging, I
> ran "xm save" from Dom-0. But this command waits forever.
>
> I tried to follow the flow of the .py files (XendCheckpoint.py etc.).These
> seem to be called when I run 'xm save'. But beyond a point I am not sure
> what the python scripts do. I also see some libxc files such as
> xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the
> XenU). Can someone help me by explaining me what happens behind thescene> when "xm save" is called ? Is there any good documentation explainingwhich
> actions are done by which layers (eg: python layer, C layer etc).
xc_save, the executable, calls xc_linux_save, the libxc function.Depending
upon whether this is a live or non-live save, some stuff is done (see
xc_linux_save for details). The Python layer is then called back,requestingthat the domain is suspended. This request is passed through to the guestbywriting /local/domain/<domid>/control/shutdown = suspend in the store.This
is seen by the guest (a watch fires inside reboot.c) and then the guest
suspends itself. This is probably where you are falling down -- if theguest
kernel is completely deadlocked, it's going to struggle to suspend itself
correctly.
If a suspend completes correctly, Xend will see it (another watch willfire),
and xc_linux_save will be free to complete the save.

So, I went and experimented this: basically, I changed XendCheckpoint.py toNOT wait for the guest to shutdown; I also changed xc_linux_save() toproceed saving without waiting (essentially, suspend_and_state() returns 0instead of retrying repeateedly). With this I am able to save a deadlockedkernel smoothly.


But when I try restore, I get this error message:
Error: /usr/lib/xen/bin/xc_restore 10 19 5 34816 1 2 failed

And the log says,

[2006-03-24 13:48:42 xend] DEBUG (XendCheckpoint:152) [xc_restore]:/usr/lib/xen/bin/xc_restore 10 19 5 34816 1 2[2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) xc_linux_restorestart: max_pfn = 8800[2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) Increased domainreservationby22000KB[2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) Reloading memorypages: 0%[2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Received all pages (0races)[2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Failed to pin batch of22 page tables: 22

[2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Restore exit with rc=1

Any clue .. so that i can overcome this and restore the kernel to itsprevious state (i.e., deadlocked state) ?


thanks,
TS

> Also, does it seem viable to clone a copy of a deadlocked guest OS inthe

> first place?

If you have a byte-for-byte copy of a deadlocked guest, even if you could
suspend it, surely it will be deadlocked when it is resumed.  How do you
intend to break the deadlock, and how is it easier to do that from outside
than it is to perform deadlock detection in the guest?

Ewan.


_________________________________________________________________

FREE pop-up blocking with the new MSN Toolbar ? get it now!http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

References:
- Re: [Xen-devel] Detecting deadlocks with hypervisor..
  - From: Ewan Mellor

Prev by Date: [Xen-users] xc_linux_save(): which context does it get invoked?
Next by Date: Re: [Xen-devel] Detecting deadlocks with hypervisor..
Previous by thread: Re: [Xen-devel] Detecting deadlocks with hypervisor..
Next by thread: Re: [Xen-devel] Detecting deadlocks with hypervisor..
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.