[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-users] Live migration problem


  • To: "Steven Hand" <Steven.Hand@xxxxxxxxxxxx>
  • From: "Cole, Ray" <Ray_Cole@xxxxxxx>
  • Date: Wed, 31 Aug 2005 16:14:16 -0500
  • Cc: xen-users@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 31 Aug 2005 21:12:23 +0000
  • List-id: Xen user discussion <xen-users.lists.xensource.com>
  • Thread-index: AcWpDai6SqD29VhtRVaUse4xGUd/MAAfhBzwAACzjQAACf6zgAEmIwAgAAXoiiAAAi4F0A==
  • Thread-topic: [Xen-users] Live migration problem

I think I have it fixed, but I'm not sure why :-)

I modified reboot.c's shutdown_handler routine to NOT call 
ctrl_if_send_response().  This appears to make live migration rock solid on my 
machines.  It appears to me that if the xenU kernel attempts to give a response 
to the suspend command that it runs the possibility of locking up.  I have very 
little knowledge about the Xen code and such, but it seems to me that if it 
works when the response is removed then nobody must be expecting a response on 
the other end of the conversation or a response is already being sent from 
somewhere else.  I realize commenting this out would then cause a response to 
not be sent for SYSRQ commands and such so this is my no means a proper 'fix', 
but I think the root cause of the problem I've been having with live migration 
periodically giving me errors that it cannot suspend has perhaps been found.

I've not performed a live migration about 14 times now without it failing with 
this change in place.

Is this enough information for someone to figure out what the real cure should 
be?  I'm starting to think that shutdown_handler should not call 
ctrl_if_send_response if it is a suspend request and no previous suspend 
request was pending, else call ctrl_if_send_response.  But I'd just be guessing.

-- Ray

-----Original Message-----
From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of Cole, Ray
Sent: Wednesday, August 31, 2005 3:25 PM
To: Steven Hand
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-users] Live migration problem


Looks like the suspend message is received in the shutdown handler.  
schedule_work is called to schedule the work but, sporadically, that work is 
never executed.  It is as if schedule_work doesn't really schedule it or it is 
unable to get executed.

-----Original Message-----
From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of Cole, Ray
Sent: Wednesday, August 31, 2005 12:41 PM
To: Steven Hand
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-users] Live migration problem


I decided to put in some printk's into reboot.c's __do_suspend.  During a 
"good" live migration run I see the printk's show up on the console.  In the 
bad one I see that __do_suspend never gets called :-(

I'll continue to follow it up the chain to see if it never gets the message to 
suspend at all or if something is going bad between getting the message and 
suspending.

I'm running xen-2.0-testing with the xen-2.0 2.6.11.12-xenU kernel BTW.

-- Ray

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.