[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-users] Live migration problem


  • To: "Steven Hand" <Steven.Hand@xxxxxxxxxxxx>
  • From: "Cole, Ray" <Ray_Cole@xxxxxxx>
  • Date: Wed, 31 Aug 2005 12:40:45 -0500
  • Cc: xen-users@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 31 Aug 2005 17:38:49 +0000
  • List-id: Xen user discussion <xen-users.lists.xensource.com>
  • Thread-index: AcWpDai6SqD29VhtRVaUse4xGUd/MAAfhBzwAACzjQAACf6zgAEmIwAg
  • Thread-topic: [Xen-users] Live migration problem

I decided to put in some printk's into reboot.c's __do_suspend.  During a 
"good" live migration run I see the printk's show up on the console.  In the 
bad one I see that __do_suspend never gets called :-(

I'll continue to follow it up the chain to see if it never gets the message to 
suspend at all or if something is going bad between getting the message and 
suspending.

I'm running xen-2.0-testing with the xen-2.0 2.6.11.12-xenU kernel BTW.

-- Ray


-----Original Message-----
From: Cole, Ray 
Sent: Thursday, August 25, 2005 3:52 PM
To: Steven Hand
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-users] Live migration problem


Here is a little more information - not sure if it is related or not...but...on 
Domain 0 I am getting an Ooops...it does not coincide with the problem below, 
but it seems to indicate something isn't going right :-)  This is with the 
2.6.11.12-Xen0 kernel for Domain 0.  It makes me wonder if something similar is 
happening in my guest OS 2.6.11.12-XenU kernel as well?

VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
Unable to handle kernel NULL pointer dereference at virtual address 0000003c
 printing eip:
c016c304
*pde = ma 00000000 pa 55555000
 [<c016d3f6>] generic_forget_inode+0x14a/0x16e
 [<c016a83c>] prune_dcache+0x1eb/0x226
 [<c016acf9>] shrink_dcache_memory+0x1f/0x45
 [<c013f4bc>] shrink_slab+0x10c/0x16f
 [<c0140aed>] balance_pgdat+0x265/0x3ac
 [<c0140cea>] kswapd+0xb6/0xe5
 [<c012e048>] autoremove_wake_function+0x0/0x4b
 [<c0108be6>] ret_from_fork+0x6/0x1c
 [<c012e048>] autoremove_wake_function+0x0/0x4b
 [<c0140c34>] kswapd+0x0/0xe5
 [<c0106eb1>] kernel_thread_helper+0x5/0xb
Oops: 0000 [#1]
PREEMPT 
Modules linked in: agpgart
CPU:    0
EIP:    0061:[<c016c304>]    Not tainted VLI
EFLAGS: 00011286   (2.6.11.12-xen0) 
EIP is at clear_inode+0x4c/0xbd
eax: 00000000   ebx: c16eebc4   ecx: 00000000   edx: c16eebc4
esi: c16eecd0   edi: c31fae00   ebp: c0574000   esp: c0575e9c
ds: 007b   es: 007b   ss: 0069
Process kswapd0 (pid: 110, threadinfo=c0574000 task=c0565a00)
Stack: c16eebc4 c16eebc4 c0574000 c016d3f6 c16eebc4 00000000 00000000 c16efd9c 
       c16eebc4 0000007b c016a83c c16eebc4 c16eebc4 c0574000 00000000 00000083 
       00000000 c109fa00 c016acf9 00000080 c013f4bc 00000080 000000d0 00002e0e 
Call Trace:
 [<c016d3f6>] generic_forget_inode+0x14a/0x16e
 [<c016a83c>] prune_dcache+0x1eb/0x226
 [<c016acf9>] shrink_dcache_memory+0x1f/0x45
 [<c013f4bc>] shrink_slab+0x10c/0x16f
 [<c0140aed>] balance_pgdat+0x265/0x3ac
 [<c0140cea>] kswapd+0xb6/0xe5
 [<c012e048>] autoremove_wake_function+0x0/0x4b
 [<c0108be6>] ret_from_fork+0x6/0x1c
 [<c012e048>] autoremove_wake_function+0x0/0x4b
 [<c0140c34>] kswapd+0x0/0xe5
 [<c0106eb1>] kernel_thread_helper+0x5/0xb
Code: 00 00 a8 10 75 02 0f 0b a8 20 74 02 0f 0b 8b 83 0c 01 00 00 8d b3 0c 01 
00 00 a8 08 75 38 8b 83 94 00 00 00 85 c0 74 
0a 8b 40 24 <8b> 50 3c 85 d2 75 60 8b 83 f4 00 00 00 85 c0 75 4c 8b b3 f8 00 
 <6>device vif3.0 entered promiscuous mode


-----Original Message-----
From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of Cole, Ray
Sent: Thursday, August 25, 2005 11:05 AM
To: Steven Hand
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-users] Live migration problem


Same thing happens when I have a process running that is not NFS-intensive.

Is there some additional debugging information I can enable to provide more 
information?

-- Ray

-----Original Message-----
From: Cole, Ray 
Sent: Thursday, August 25, 2005 10:48 AM
To: 'Steven Hand'
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-users] Live migration problem


I intsalled 2.0-testing.  The 2.6.12-xen0/U kernel in 2.0-testing that I pulled 
this morning seems to panic right up front so I went back to the 
2.6.11.12-xen0/U kernel.

This combination still causes the suspend to fail with the same 'Retry suspend 
domain' calls.

What I'm running in the domain will be doing a ton of NFS access (doing a build 
that uses all networked resources).  Not sure if that would come into play or 
not.  I'll try making it do something else that will be less NFS intensive to 
see if that works better or not.

-- Ray

-----Original Message-----
From: Steven Hand [mailto:Steven.Hand@xxxxxxxxxxxx]
Sent: Wednesday, August 24, 2005 7:41 PM
To: Cole, Ray
Cc: xen-users@xxxxxxxxxxxxxxxxxxx; Steven.Hand@xxxxxxxxxxxx
Subject: Re: [Xen-users] Live migration problem

The latter - although there was a bug earlier whereby even though the
message (a 'xfr_vm_suspend' message to xend) was correctly delivered,
it could get ignored by xend. This is fixed in 2.0-testing; the patch
is small though and so you could just try it on a 2.0 tree if you don't 
want to upgrade your kernels. 

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.