[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] XEN crash and double fault when doing cpu online/offline
On 1/8/20 3:50 PM, Jürgen Groß wrote: On 08.01.20 06:50, Tao Xu wrote:Hi,When I use xen-hptool cpu-offline/cpu-online to let CPU in a socket online/offline using the script as follows:for((j=48;j<=95;j++)); do xen-hptool cpu-offline $j done for((j=48;j<=95;j++)); do xen-hptool cpu-online $j doneXen crash when cpu re-online. I use the upstream XEN(0dd92688) and try many days, it still crash. But if I only do cpu online/offline for CPU 48~59, Xen will not crash. The bug can be reproduced when we do cpu online/offline for most CPU in a socket. And interesting thing is when we use the script as follow:for((j=48;j<=95;j++)); do xen-hptool cpu-offline $j xen-hptool cpu-online $j done Xen will not crash too. Is there a bug in sched_credit2? The crash message as follows: (XEN) Adding cpu 77 to runqueue 1 (XEN) Adding cpu 78 to runqueue 1 (XEN) Adding cpu 79 to runqueue 1 (XEN) Adding cpu 80 to runqueue 1 (X(ENXE) N) *** DOUBLE FAULT ***(XEN) Assertion 'debug->cpu == smp_processor_id()' failed at spinlock.c:88(XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- (XEN) Debugging connection not set up. (XEN) CPU: 48 (XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d080240bfc>] _spin_unlock+0x40/0x42So the original problem causes a double fault, but spinlock debugging causes a subsequent panic. Can you please retry the tests with the attached patch? It should result in diagnostic data related to the real problem. Juergen Hi Juergen,After apply your patch, spin_lock still assert. And the address ffff82d0bffce880 is not in the xen-syms. (XEN) Adding cpu 78 to runqueue 1 (XEN) *** DOUBLE FAULT *** (XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 49 (XEN) RIP: e008:[<ffff82d0bffce880>] ffff82d0bffce880 (XEN) RFLAGS: 0000000000010012 CONTEXT: hypervisor (XEN) rax: 0000000000000018 rbx: 00000adda6074720 rcx: ffffffff8100130a (XEN) rdx: ffffc90041114e40 rsi: 000000000000003b rdi: 0000000000000008 (XEN) rbp: 000000000000003b rsp: ffffc90041114e28 r8: 00000adda5f86678 (XEN) r9: 00000040bb3e6121 r10: 00000040bb2f1ee1 r11: 0000000000000212 (XEN) r12: ffff88fcdbcd7140 r13: ffff88fcdbcde438 r14: ffff88fcdbcde478 (XEN) r15: ffff88fcdbcde4b8 cr0: 0000000080050033 cr4: 00000000003426e0 (XEN) cr3: 0000002391e02000 cr2: ffffc90041114e18 (XEN) fsb: 0000000000000000 gsb: ffff88fcdbcc0000 gss: 0000000000000000 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen code around <ffff82d0bffce880> (ffff82d0bffce880):(XEN) 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (XEN) Current stack base ffffc90041110000 differs from expected ffff837e77190000 (XEN) Valid stack range: ffffc90041116000-ffffc90041118000, sp=ffffc90041114e28, tss.rsp0=ffff837e77197fa0 (XEN) No stack overflow detected. Skipping stack trace. (XEN) (XEN) **************************************** (XEN) Panic on CPU 49: (XEN) DOUBLE FAULT -- system shutdown (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... (XEN) Debugging connection not set up. (XEN)( XEN) *** DOUBLE FAULT ***(XEN) Assertion 'atomic_read(&spin_debug) > 0 || debug->cpu == smp_processor_id()' failed at spinlock.c:88 (XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- (XEN) Debugging connection not set up. (XEN) CPU: 52 (XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 50 (XEN) RIP: e008:[<ffff82d080240c06>] _spin_unlock+0x4a/0x4c (XEN) RFLAGS: 0000000000050002 CONTEXT: hypervisor (d0v1) (XEN) rax: ffff837e77017fff rbx: 0000000000040046 rcx: 0000000000000000 (XEN) rdx: 0000000000000034 rsi: 0000000000040046 rdi: ffff82d080819860 (XEN) rbp: ffff837e77010d38 rsp: ffff837e77010d38 r8: 0000000000000000 (XEN) r9: 0000000000000004 r10: 0000000000000001 r11: 0000000000000000 (XEN) r12: ffff82d08044d284 r13: 0000000000000010 r14: ffff82d08044d284 (XEN) r15: ffff82d0808197e0 cr0: 0000000080050033 cr4: 00000000003426e0 (XEN) cr3: 000000200e60a000 cr2: ffffc9004007cbb8 (XEN) fsb: 0000000000000000 gsb: ffff88fcdae40000 gss: 0000000000000000 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen code around <ffff82d080240c06> (_spin_unlock+0x4a/0x4c):(XEN) 7f 00 00 3b 50 c1 74 dc <0f> 0b 55 48 89 e5 e8 ab ff ff ff fb 5d c3 55 48 (XEN) Xen stack trace from rsp=ffff837e77010d38: (XEN) ffff837e77010d50 ffff82d080240c21 0000000000000020 ffff837e77010da8 (XEN) ffff82d080252eb8 0000000d8f512778 0000000000040046 ffff82d080819860 (XEN) 0000001000000000 0000000000000006 ffff82d08044d27e ffff82d08093e700 (XEN) 0000000000040086 ffff837e77010e58 ffff837e77010db8 ffff82d08024fe4b (XEN) ffff837e77010dd8 ffff82d08024fe87 0000000000000000 ffff83201081d3a0 (XEN) ffff837e77010e40 ffff82d08024feec 44e4a2a937cfbed7 a22ad7391a19609e (XEN) d4f7a456dec5cb24 ffff837e77010e20 ffff82d080240b77 ffff82d080819718 (XEN) ffff82d0804564bf ffff83201081d3a0 ffff837e77010e98 0000000000040086 (XEN) ffff82d08093e714 ffff837e77010e88 ffff82d0802503f9 ffff82d08044d27e (XEN) ffff82d08093e700 ffff837e77010f58 0000000000000032 0000000000000000 (XEN) ffff837e77017fff 0000000000000000 ffff837e77010ee0 ffff82d080250511 (XEN) ffff837e00000008 ffff837e77010ef0 ffff837e77010eb0 ffff837e77017fff (XEN) 0000000000040046 0000000000000032 ffff82d080819701 0000000000000000 (XEN) 0000000000000000 ffff837e77010f48 ffff82d080382f2a ffff82d080389c66 (XEN) ffff82d080389c72 ffff82d080389c66 ffff82d080389c72 ffff82d080389c66 (XEN) ffff82d080389c72 ffff82d080389c66 ffff82d080389c72 0000000000000000 (XEN) 0000000000000000 0000000000000000 00007c8188fef087 ffff82d080389cc7 (XEN) ffff88fcc9036f00 ffffc9004007cde8 000000000002ad80 ffff88fcc97d5d00 (XEN) 0000000000000002 ffffc9004007cc50 0000000000000286 0000000000000014 (XEN) 0000000000000400 0000000000000014 0000000000000017 ffffffff810012eb (XEN) Xen call trace: (XEN) [<ffff82d080240c06>] R _spin_unlock+0x4a/0x4c (XEN) [<ffff82d080240c21>] F _spin_unlock_irqrestore+0xd/0x24 (XEN) [<ffff82d080252eb8>] F serial_puts+0x131/0x141 (XEN) [<ffff82d08024fe4b>] F console_serial_puts+0x28/0x2a (XEN) [<ffff82d08024fe87>] F drivers/char/console.c#__putstr+0x3a/0x8b(XEN) [<ffff82d08024feec>] F drivers/char/console.c#printk_start_of_line+0x14/0x17b (XEN) [<ffff82d0802503f9>] F drivers/char/console.c#vprintk_common+0x8d/0x158 (XEN) [<ffff82d080250511>] F printk+0x4d/0x4f (XEN) [<ffff82d080382f2a>] F do_double_fault+0x2b/0x82 (XEN) [<ffff82d080389cc7>] F double_fault+0x107/0x110 (XEN) (XEN) RIP: e008:[<ffff82d0bffcba00>](XEN) (XEN) **************************************** ffff82d0bffcba00(XEN) Panic on CPU 50:(XEN) RFLAGS: 0000000000010006 (XEN) Assertion 'atomic_read(&spin_debug) > 0 || debug->cpu == smp_processor_id()' failed at spinlock.c:88 CONTEXT: hypervisor(XEN) **************************************** (XEN) (XEN) rax: 0000000000000020 rbx: ffff88fcdb52ad80 rcx: ffffffff8100140a (XEN) Reboot in five seconds... (XEN) rdx: ffff88fcc98e6628 rsi: ffffc900408f8d24 rdi: 0000000000000004 (XEN) Debugging connection not set up. (XEN) rbp: ffff88fcbe39cb80 rsp: ffffc900408f8d08 r8: ffff88fcca4068f0 (XEN) r9: ffff88fcca4069a0 r10: 0000000000000000 r11: 0000000000000206 (XEN) r12: 0000000000000004 r13: ffffc900408f8d80 r14: ffff88fcbe39d2fc (XEN) r15: ffff88fcdb52ad80 cr0: 0000000080050033 cr4: 00000000003426e0 (XEN) cr3: 000000200e60a000 cr2: ffffc900408f8cf8 (XEN) fsb: 0000000000000000 gsb: ffff88fcdb100000 gss: 0000000000000000 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen code around <ffff82d0bffcba00> (ffff82d0bffcba00):(XEN) 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (XEN) Current stack base ffffc900408f8000 differs from expected ffff837e77038000 (XEN) Valid stack range: ffffc900408fe000-ffffc90040900000, sp=ffffc900408f8d08, tss.rsp0=ffff837e7703ffa0 (XEN) No stack overflow detected. Skipping stack trace. (XEN) (XEN) **************************************** (XEN) Panic on CPU 52: (XEN) DOUBLE FAULT -- system shutdown (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... (XEN) Debugging connection not set up. (XEN) Debugging connection not set up. (XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<0000000067b4cb2d>] 0000000067b4cb2d (XEN) RFLAGS: 0000000000010206 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830059027a60 rcx: 0000000067c50000 (XEN) rdx: 0000000000000000 rsi: 00000000003526e0 rdi: ffff830059027a40 (XEN) rbp: ffff830059027b68 rsp: ffff8300590279a0 r8: ffff830059027a60 (XEN) r9: ffff830059027a40 r10: 0000000067b4e1b8 r11: 0101010101010101 (XEN) r12: 00000000fffffffe r13: 0000000000000000 r14: 0000000000000065 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000003526e0 (XEN) cr3: 000000203fe4e000 cr2: 0000000067c50010 (XEN) fsb: 0000000000000000 gsb: ffff88fcdb280000 gss: 0000000000000000 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen code around <0000000067b4cb2d> (0000000067b4cb2d):(XEN) 6b c0 10 48 8b 4c 24 20 <48> 8b 44 01 10 48 89 44 24 28 48 8b 44 24 28 48 (XEN) Xen stack trace from rsp=ffff8300590279a0: (XEN) ffff8300590279b0 ffff8300590279c8 ffff82d080240cd5 ffff82d0802510eb (XEN) 0000000067c50000 ffff830059027a00 0000000000000206 0000000067b4bf3c (XEN) ffff830059027a60 ffff82d0808197e0 ffff830059027aa8 0000000000000000 (XEN) 000000203fe4e000 0000000067b4b590 ffff830059027ae0 00000000000000f1 (XEN) ffff830059027a30 ffff82d080240ba5 ffff830059027a98 0000000067aeb54b (XEN) ffff82d080389845 ffff832010000424 ffff830059027c68 0000000400000000 (XEN) 00000000000fa000 67c5000000000200 0000000000000000 0000000067aeb8d7 (XEN) 0000000000000000 ffff830059027fff 0000000000000000 00007cffa6fd8537 (XEN) 0000000000000000 0000000067aeb6ae 00000000000000fb ffff82d080808aa0 (XEN) 00000000003526e0 ffff830059027b20 0000000000000000 0000000067aeb476 (XEN) ffff830000000000 ffff830059027b40 0000000059014000 0000000000000000 (XEN) ffff830059027b30 ffff82d0803867c4 0000000000000000 ffff82d080386ac8 (XEN) 0000000000000000 00000000fffffffe ffff830059027b68 ffff82d080386a99 (XEN) 0000000059014000 000000000000e010 0000000000000000 00000000000000fb (XEN) ffffffffffffffff ffff830059027bb8 ffff82d0802a4964 0000138880389851 (XEN) 000082d080389845 0000000000000000 0000000000000000 00000000000000fb (XEN) ffff830059027c68 00000000000000fb 0000000000000000 ffff830059027bc8 (XEN) ffff82d0802a4a91 ffff830059027be0 ffff82d080240a08 0000000000000000 (XEN) ffff830059027bf0 ffff82d0802a5136 ffff830059027c58 ffff82d0802858bf (XEN) ffff82d080389845 ffff82d080389851 0000000000000000 8000000080389851 (XEN) Xen call trace: (XEN) [<0000000067b4cb2d>] R 0000000067b4cb2d (XEN) [<ffff8300590279b0>] S ffff8300590279b0 (XEN) [<ffff82d0802a4964>] F machine_restart+0x168/0x28a (XEN) [<ffff82d0802a4a91>] F send_IPI_mask+0/0xc (XEN) [<ffff82d080240a08>] F smp_call_function_interrupt+0xa8/0xac (XEN) [<ffff82d0802a5136>] F call_function_interrupt+0x20/0x34 (XEN) [<ffff82d0802858bf>] F do_IRQ+0x148/0x6d4 (XEN) [<ffff82d0803898ba>] F common_interrupt+0x10a/0x120 (XEN) [<ffff82d080253645>] F cpufreq_add_cpu+0xbc/0x5cf(XEN) [<ffff82d080253da9>] F drivers/cpufreq/cpufreq.c#cpu_callback+0x27/0x32 (XEN) [<ffff82d0802242c0>] F notifier_call_chain+0x6b/0x96(XEN) [<ffff82d080200f95>] F common/cpu.c#cpu_notifier_call_chain+0x1b/0x33 (XEN) [<ffff82d080201215>] F cpu_up+0xa8/0xe5 (XEN) [<ffff82d0802a8185>] F cpu_up_helper+0xf/0xa5(XEN) [<ffff82d080205d5d>] F common/domain.c#continue_hypercall_tasklet_handler+0x4c/0xb9 (XEN) [<ffff82d080242de5>] F common/tasklet.c#do_tasklet_work+0x76/0xa9 (XEN) [<ffff82d0802430c6>] F do_tasklet+0x58/0x8a (XEN) [<ffff82d080275545>] F arch/x86/domain.c#idle_loop+0x40/0x9b (XEN) (XEN) Pagetable walk from 0000000067c50010: (XEN) L4[0x000] = 000000203fe4d063 ffffffffffffffff (XEN) L3[0x001] = 000000005900d063 ffffffffffffffff (XEN) L2[0x13e] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 0000000067c50010 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... (XEN) Debugging connection not set up. (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |