Xen project Mailing List

[Xen-devel] DomU crash during migration when suspending source domain

From: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>

Date: Tue, 13 Feb 2007 22:42:15 -0500

Delivery-date: Tue, 13 Feb 2007 19:41:37 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcdP6h4+HveIAzruQ3+gt7NQNapEGw==

Thread-topic: DomU crash during migration when suspending source domain

Just run into an odd DomU crash doing live migration of a 4-VCPU domain (with 3.0.4 but the code looks the same in 2.6.18/unstable to me) - the actual panic is attached at the end of this, but the bottom line is that the code in cache_remove_shared_cpu_map (in arch/i385/kernel/cpu/intel_cacheinfo.c) is attempting to clean up the cache info for a processor that does not yet have this info setup - the code is dereferencing a pointer in the cpuid4_info[] array and looking at the dump I can see that this is NULL. My working theory here is that we attempted the migration waaay early and the initialization of the array of cache info pointers was not setup for all processors yet; it would be relatively easy to protect against this by checking for NULL, but I'm not sure if this is the correct solution or not -- if anyone is familiar with this code and can comment on an appropriate fix I'd be grateful. One thing I'm really not sure about is the timing of marking the CPUs up with respect to the trace re initializing CPUs (see console output below) -- I can see that the four VCPUs are setup in the cpu_sys_devices array (which is setup by the code that outputs the 'Initializing CPU#n' trace) but the array of cache info structures only has an entry for VCPU 0: crash> cpu_sys_devices cpu_sys_devices = $3 = {0xc0464448, 0xc046448c, 0xc04644d0, 0xc0464514, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} crash> cpuid4_info cpuid4_info = $4 = {0xc7971180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} Any suggestions for appropriate fixes here? Simon --- console output --- Enabling SMP... Initializing CPU#3 Initializing CPU#2 Initializing CPU#1 eth0: no IPv6 routers present Unable to handle kernel NULL pointer dereference at virtual address 00000010 printing eip: c010dd3a 0204a000 -> *pde = 00000001:0d8ec001 06a9c000 -> *pme = 00000000:00000000 Oops: 0000 [#1] SMP Modules linked in: ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core binfmt_misc dm_mirror dm_mod bnx2 ata_piix libata mptscsih mptfc mptspi mptsas mptscsi scsi_mod mptbase CPU: 0 EIP: 0061:[<c010dd3a>] Tainted: GF VLI EFLAGS: 00010202 (2.6.16.29-xen #1) EIP is at cache_remove_shared_cpu_map+0x1a/0x90 eax: 00000000 ebx: 00000001 ecx: 00000001 edx: 00000000 esi: 00000000 edi: 00000010 ebp: c3913f14 esp: c3913f08 ds: 007b es: 007b ss: 0069 Process suspend (pid: 4038, threadinfo=c3912000 task=c2244570) Stack: <0>00000001 00000001 00000000 c3913f28 c010e3ba 00000007 00000001 00000007 c3913f34 c010e425 c03bd804 c3913f48 c012fae8 ffffffea 00000001 c568c570 c3913f7c c013b889 c3913fc0 00000002 00000001 00000000 00000003 00000000 Call Trace: [<c0105401>] show_stack_log_lvl+0xa1/0xe0 [<c01055f1>] show_registers+0x181/0x200 [<c0105810>] die+0x100/0x1a0 [<c01156f6>] do_page_fault+0x3c6/0x8b1 [<c0105067>] error_code+0x2b/0x30 [<c010e3ba>] cache_remove_dev+0x2a/0x60 [<c010e425>] cacheinfo_cpu_callback+0x35/0x40 [<c012fae8>] notifier_call_chain+0x18/0x40 [<c013b889>] cpu_down+0x139/0x260 [<c028bc9f>] smp_suspend+0x7f/0x100 [<c028ca80>] __do_suspend+0x40/0x180 [<c0136a06>] kthread+0x96/0xe0 [<c0102e95>] kernel_thread_helper+0x5/0x10 Code: 0c 5b 5e 5f 5d c3 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 57 56 89 d6 53 89 c3 8d 04 92 8b 14 9d 20 4d 46 c0 8d 04 82 8d 78 10 <8b> 40 10 ba 20 00 00 00 85 c0 74 03 0f bc d0 83 fa 21 b9 20 00 -and- crash> bt PID: 4038 TASK: c2244570 CPU: 0 COMMAND: "suspend" #0 [c3913ddc] xen_panic_event at c010a527 #1 [c3913df8] notifier_call_chain at c012fae6 #2 [c3913e0c] panic at c0120b16 #3 [c3913e20] die at c0105866 #4 [c3913e6c] do_page_fault at c01156f1 #5 [c3913ed0] error_code (via page_fault) at c0105065 EAX: 00000000 EBX: 00000001 ECX: 00000001 EDX: 00000000 EBP: c3913f14 DS: 007b ESI: 00000000 ES: 007b EDI: 00000010 CS: 0061 EIP: c010dd3a ERR: ffffffff EFLAGS: 00010202 #6 [c3913f04] cache_remove_shared_cpu_map at c010dd3a #7 [c3913f18] cache_remove_dev at c010e3b5 #8 [c3913f2c] cacheinfo_cpu_callback at c010e420 #9 [c3913f38] notifier_call_chain at c012fae6 #10 [c3913f4c] cpu_down at c013b884 #11 [c3913f80] smp_suspend at c028bc9a #12 [c3913f98] __do_suspend at c028ca7b #13 [c3913fc4] kthread at c0136a03 #14 [c3913fe8] kernel_thread_helper at c0102e93 crash> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.