[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Live migration OOPS



Hi all,

I am running Debian Squeeze with a back ported kernel 3.2.0-0.bpo.4-amd64. The cluster is managed by ganeti.

Hardware:

Source:
Dell r510 64Gb RAM

Destination:
Dell r720 192Gb RAM

When live migrating VMs from an older machine to the newer one I get a kernel oops and the VM seems to not recover from being restored.

It looks similar to 


If I start it on a smaller system and migrate to the larger one it fails.
If I start the VM on the larger host and migrate back it works. If I then migrate back to the larger host it works.
If I remove memory from the newer machine making it have 64Gb I can live migrate VMs to it.

Can anyone advise a solution? I notice there are possible fixes for this but unsure from the threads what is the current status of the patches, which versions of Xen/linux kernel this should work with.


Many thanks,

Matt

Output on the console of the VM is as follows:

[4760249.894618] BUG: unable to handle kernel paging request at 00007f08a0b23414
[4760249.894618] IP: [<ffffffff810077c9>] xen_setup_mfn_list_list+0x2d/0x4b
[4760249.894618] PGD 0 
[4760249.894618] Oops: 0002 [#1] SMP 
[4760249.894618] CPU 0 
[4760249.894618] Modules linked in: autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop coretemp crc32c_intel ghash_clmulni_intel evdev aesni_intel aes_x86_64 snd_pcm snd_page_alloc snd_timer snd aes_generic soundcore cryptd pcspkr ext3 jbd mbcache dm_mod xen_netfront xen_blkfront
[4760249.894618] 
[4760249.894618] Pid: 6, comm: migration/0 Not tainted 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1  
[4760249.894618] RIP: e030:[<ffffffff810077c9>]  [<ffffffff810077c9>] xen_setup_mfn_list_list+0x2d/0x4b
[4760249.894618] RSP: e02b:ffff88001e9dbd70  EFLAGS: 00010002
[4760249.894618] RAX: 0000000001805694 RBX: ffffffffff57a000 RCX: ffffffff81815000
[4760249.894618] RDX: 000000000000000c RSI: ffff88001e9d4e60 RDI: 0000000001805694
[4760249.894618] RBP: 0000000000000000 R08: 0000000000000000 R09: 80000000ba109063
[4760249.894618] R10: 0000000000007ff0 R11: 000000000000036d R12: 0000000000000003
[4760249.894618] R13: ffff88001ea13da4 R14: ffff88001ea13d01 R15: ffff88001e9d4e60
[4760249.894618] FS:  00007fe3e858e700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
[4760249.894618] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[4760249.894618] CR2: 00007f08a0b23414 CR3: 000000000ca9f000 CR4: 0000000000002660
[4760249.894618] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[4760249.894618] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[4760249.894618] Process migration/0 (pid: 6, threadinfo ffff88001e9da000, task ffff88001e9d4e60)
[4760249.894618] Stack:
[4760249.894618]  0000000000000000 ffffffff81007292 ffff88001ea13e10 ffffffff812323b4
[4760249.894618]  ffffffff81815000 ffffffff81232389 ffff880000000002 ffff88001e9da010
[4760249.894618]  ffff88001e9d0201 ffff88001ea13d80 ffff88001fc002b8 ffffffff8108da5f
[4760249.894618] Call Trace:
[4760249.894618]  [<ffffffff81007292>] ? xen_arch_post_suspend+0xd/0x97
[4760249.894618]  [<ffffffff812323b4>] ? xen_post_suspend+0x9/0x14
[4760249.894618]  [<ffffffff81232389>] ? xen_suspend+0x68/0x8a
[4760249.894618]  [<ffffffff8108da5f>] ? stop_machine_cpu_stop+0x84/0xc0
[4760249.894618]  [<ffffffff8108d9db>] ? stop_one_cpu_nowait+0x39/0x39
[4760249.894618]  [<ffffffff8108d7cf>] ? cpu_stopper_thread+0xef/0x191
[4760249.894618]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[4760249.894618]  [<ffffffff8136761c>] ? __schedule+0x5a0/0x5cd
[4760249.894618]  [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618]  [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618]  [<ffffffff8106371d>] ? kthread+0x7a/0x82
[4760249.894618]  [<ffffffff81370274>] ? kernel_thread_helper+0x4/0x10
[4760249.894618]  [<ffffffff8136e333>] ? int_ret_from_sys_call+0x7/0x1b
[4760249.894618]  [<ffffffff81368e7c>] ? retint_restore_args+0x5/0x6
[4760249.894618]  [<ffffffff81370270>] ? gs_change+0x13/0x13
[4760249.894618] Code: 8b 1d e4 6f 60 00 48 81 fb 20 a1 73 81 75 04 0f 0b eb fe 48 8b 3d 08 56 73 00 e8 77 b8 02 00 48 c1 e8 0c 48 89 c7 e8 7e fe ff ff <48> 89 83 18 0c 00 00 48 8b 15 69 54 68 00 48 8b 05 aa 6f 60 00 
[4760249.894618] RIP  [<ffffffff810077c9>] xen_setup_mfn_list_list+0x2d/0x4b
[4760249.894618]  RSP <ffff88001e9dbd70>
[4760249.894618] CR2: 00007f08a0b23414
[4760249.894618] ---[ end trace 0fcf6cf0a1d1efdd ]---
[4760249.894618] ------------[ cut here ]------------
[4760249.894618] WARNING: at /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/kernel/time/timekeeping.c:298 ktime_get_ts+0x27/0x85()
[4760249.894618] Modules linked in: autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop coretemp crc32c_intel ghash_clmulni_intel evdev aesni_intel aes_x86_64 snd_pcm snd_page_alloc snd_timer snd aes_generic soundcore cryptd pcspkr ext3 jbd mbcache dm_mod xen_netfront xen_blkfront
[4760249.894618] Pid: 6, comm: migration/0 Tainted: G      D      3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1
[4760249.894618] Call Trace:
[4760249.894618]  [<ffffffff8104996b>] ? warn_slowpath_common+0x78/0x8c
[4760249.894618]  [<ffffffff8106b2eb>] ? ktime_get_ts+0x27/0x85
[4760249.894618]  [<ffffffff81081d56>] ? do_acct_process+0x89/0x3bc
[4760249.894618]  [<ffffffff810820ed>] ? acct_process+0x64/0x7d
[4760249.894618]  [<ffffffff8104cf53>] ? do_exit+0x265/0x799
[4760249.894618]  [<ffffffff81366e89>] ? printk+0x40/0x47
[4760249.894618]  [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11
[4760249.894618]  [<ffffffff81049ea6>] ? kmsg_dump+0x53/0xef
[4760249.894618]  [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11
[4760249.894618]  [<ffffffff81369a09>] ? oops_end+0xb1/0xb6
[4760249.894618]  [<ffffffff8102fdad>] ? no_context+0x1ff/0x20c
[4760249.894618]  [<ffffffff8136bb6a>] ? do_page_fault+0x1ad/0x34c
[4760249.894618]  [<ffffffff81007605>] ? get_phys_to_machine+0x16/0x58
[4760249.894618]  [<ffffffff810046e2>] ? pte_pfn_to_mfn+0x23/0x74
[4760249.894618]  [<ffffffff810047b0>] ? xen_make_pte+0x7d/0x7f
[4760249.894618]  [<ffffffff810044f5>] ? __raw_callee_save_xen_make_pte+0x11/0x1e
[4760249.894618]  [<ffffffff810042c9>] ? xen_mc_flush+0x12b/0x158
[4760249.894618]  [<ffffffff813690f5>] ? page_fault+0x25/0x30
[4760249.894618]  [<ffffffff810077c9>] ? xen_setup_mfn_list_list+0x2d/0x4b
[4760249.894618]  [<ffffffff81007292>] ? xen_arch_post_suspend+0xd/0x97
[4760249.894618]  [<ffffffff812323b4>] ? xen_post_suspend+0x9/0x14
[4760249.894618]  [<ffffffff81232389>] ? xen_suspend+0x68/0x8a
[4760249.894618]  [<ffffffff8108da5f>] ? stop_machine_cpu_stop+0x84/0xc0
[4760249.894618]  [<ffffffff8108d9db>] ? stop_one_cpu_nowait+0x39/0x39
[4760249.894618]  [<ffffffff8108d7cf>] ? cpu_stopper_thread+0xef/0x191
[4760249.894618]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[4760249.894618]  [<ffffffff8136761c>] ? __schedule+0x5a0/0x5cd
[4760249.894618]  [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618]  [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618]  [<ffffffff8106371d>] ? kthread+0x7a/0x82
[4760249.894618]  [<ffffffff81370274>] ? kernel_thread_helper+0x4/0x10
[4760249.894618]  [<ffffffff8136e333>] ? int_ret_from_sys_call+0x7/0x1b
[4760249.894618]  [<ffffffff81368e7c>] ? retint_restore_args+0x5/0x6
[4760249.894618]  [<ffffffff81370270>] ? gs_change+0x13/0x13
[4760249.894618] ---[ end trace 0fcf6cf0a1d1efde ]---
[4760249.894618] ------------[ cut here ]------------
[4760249.894618] WARNING: at /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/kernel/time/timekeeping.c:265 ktime_get+0x1e/0x88()
[4760249.894618] Modules linked in: autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop coretemp crc32c_intel ghash_clmulni_intel evdev aesni_intel aes_x86_64 snd_pcm snd_page_alloc snd_timer snd aes_generic soundcore cryptd pcspkr ext3 jbd mbcache dm_mod xen_netfront xen_blkfront
[4760249.894618] Pid: 6, comm: migration/0 Tainted: G      D W    3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1
[4760249.894618] Call Trace:
[4760249.894618]  [<ffffffff8104996b>] ? warn_slowpath_common+0x78/0x8c
[4760249.894618]  [<ffffffff8106b367>] ? ktime_get+0x1e/0x88
[4760249.894618]  [<ffffffffa003a8f2>] ? start_this_handle+0x16c/0x30d [jbd]
[4760249.894618]  [<ffffffffa003abe6>] ? journal_start+0x94/0xc3 [jbd]
[4760249.894618]  [<ffffffffa004dd1d>] ? ext3_dirty_inode+0x25/0x78 [ext3]
[4760249.894618]  [<ffffffff81124fad>] ? __mark_inode_dirty+0x22/0x1a7
[4760249.894618]  [<ffffffff81119558>] ? file_update_time+0xd4/0xff
[4760249.894618]  [<ffffffff810bde74>] ? __generic_file_aio_write+0x15b/0x277
[4760249.894618]  [<ffffffff81049b42>] ? __call_console_drivers+0x75/0x86
[4760249.894618]  [<ffffffff81368cc8>] ? _raw_spin_lock_irqsave+0x11/0x2f
[4760249.894618]  [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11
[4760249.894618]  [<ffffffff810bdfef>] ? generic_file_aio_write+0x5f/0xb3
[4760249.894618]  [<ffffffff811066e2>] ? do_sync_write+0xba/0xf3
[4760249.894618]  [<ffffffff8100fcd2>] ? dump_trace+0x236/0x245
[4760249.894618]  [<ffffffff8106b2eb>] ? ktime_get_ts+0x27/0x85
[4760249.894618]  [<ffffffff8102d1b2>] ? pvclock_clocksource_read+0x46/0xb4
[4760249.894618]  [<ffffffff8106a696>] ? timekeeping_get_ns+0xd/0x2a
[4760249.894618]  [<ffffffff81082047>] ? do_acct_process+0x37a/0x3bc
[4760249.894618]  [<ffffffff810820ed>] ? acct_process+0x64/0x7d
[4760249.894618]  [<ffffffff8104cf53>] ? do_exit+0x265/0x799
[4760249.894618]  [<ffffffff81366e89>] ? printk+0x40/0x47
[4760249.894618]  [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11
[4760249.894618]  [<ffffffff81049ea6>] ? kmsg_dump+0x53/0xef
[4760249.894618]  [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11
[4760249.894618]  [<ffffffff81369a09>] ? oops_end+0xb1/0xb6
[4760249.894618]  [<ffffffff8102fdad>] ? no_context+0x1ff/0x20c
[4760249.894618]  [<ffffffff8136bb6a>] ? do_page_fault+0x1ad/0x34c
[4760249.894618]  [<ffffffff81007605>] ? get_phys_to_machine+0x16/0x58
[4760249.894618]  [<ffffffff810046e2>] ? pte_pfn_to_mfn+0x23/0x74
[4760249.894618]  [<ffffffff810047b0>] ? xen_make_pte+0x7d/0x7f
[4760249.894618]  [<ffffffff810044f5>] ? __raw_callee_save_xen_make_pte+0x11/0x1e
[4760249.894618]  [<ffffffff810042c9>] ? xen_mc_flush+0x12b/0x158
[4760249.894618]  [<ffffffff813690f5>] ? page_fault+0x25/0x30
[4760249.894618]  [<ffffffff810077c9>] ? xen_setup_mfn_list_list+0x2d/0x4b
[4760249.894618]  [<ffffffff81007292>] ? xen_arch_post_suspend+0xd/0x97
[4760249.894618]  [<ffffffff812323b4>] ? xen_post_suspend+0x9/0x14
[4760249.894618]  [<ffffffff81232389>] ? xen_suspend+0x68/0x8a
[4760249.894618]  [<ffffffff8108da5f>] ? stop_machine_cpu_stop+0x84/0xc0
[4760249.894618]  [<ffffffff8108d9db>] ? stop_one_cpu_nowait+0x39/0x39
[4760249.894618]  [<ffffffff8108d7cf>] ? cpu_stopper_thread+0xef/0x191
[4760249.894618]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[4760249.894618]  [<ffffffff8136761c>] ? __schedule+0x5a0/0x5cd
[4760249.894618]  [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618]  [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618]  [<ffffffff8106371d>] ? kthread+0x7a/0x82
[4760249.894618]  [<ffffffff81370274>] ? kernel_thread_helper+0x4/0x10
[4760249.894618]  [<ffffffff8136e333>] ? int_ret_from_sys_call+0x7/0x1b
[4760249.894618]  [<ffffffff81368e7c>] ? retint_restore_args+0x5/0x6
[4760249.894618]  [<ffffffff81370270>] ? gs_change+0x13/0x13
[4760249.894618] ---[ end trace 0fcf6cf0a1d1efdf ]---
[4760249.894618] ------------[ cut here ]------------
[4760249.894618] WARNING: at /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/kernel/time/timekeeping.c:265 ktime_get+0x1e/0x88()
[4760249.894618] Modules linked in: autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop coretemp crc32c_intel ghash_clmulni_intel evdev aesni_intel aes_x86_64 snd_pcm snd_page_alloc snd_timer snd aes_generic soundcore cryptd pcspkr ext3 jbd mbcache dm_mod xen_netfront xen_blkfront
[4760249.894618] Pid: 0, comm: swapper/0 Tainted: G      D W    3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1
[4760249.894618] Call Trace:
[4760249.894618]  [<ffffffff8104996b>] ? warn_slowpath_common+0x78/0x8c
[4760249.894618]  [<ffffffff8106b367>] ? ktime_get+0x1e/0x88
[4760249.894618]  [<ffffffff81070f4c>] ? tick_nohz_stop_sched_tick+0x66/0x332
[4760249.894618]  [<ffffffff8100ddaf>] ? cpu_idle+0x7c/0xef
[4760249.894618]  [<ffffffff816abc48>] ? start_kernel+0x3c7/0x3d2
[4760249.894618]  [<ffffffff816ad746>] ? xen_start_kernel+0x415/0x41a
[4760249.894618] ---[ end trace 0fcf6cf0a1d1efe0 ]---


--
Matthew Baker :: Unix/Security Team Lead
Infrastructure, Systems and Operations @University of Bristol
Team email: it-sysops@xxxxxxxxxxxxx
Tel: +44(0)117 3317467
Add: Uni of Bristol, Computer Centre, Tyndal Ave, Bristol. BS8 1UD


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.