[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 16.07.19 17:45, Sergey Dyasli wrote: On 05/07/2019 14:17, Sergey Dyasli wrote:[2019-07-05 00:37:16 UTC] (XEN) [24907.482686] Watchdog timer detects that CPU30 is stuck! [2019-07-05 00:37:16 UTC] (XEN) [24907.514180] ----[ Xen-4.13.0-8.0.6-d x86_64 debug=y Not tainted ]---- [2019-07-05 00:37:16 UTC] (XEN) [24907.552070] CPU: 30 [2019-07-05 00:37:16 UTC] (XEN) [24907.565281] RIP: e008:[<ffff82d0802406fc>] sched_context_switched+0xaf/0x101 [2019-07-05 00:37:16 UTC] (XEN) [24907.601232] RFLAGS: 0000000000000202 CONTEXT: hypervisor [2019-07-05 00:37:16 UTC] (XEN) [24907.629998] rax: 0000000000000002 rbx: ffff83202782e880 rcx: 000000000000001e [2019-07-05 00:37:16 UTC] (XEN) [24907.669651] rdx: ffff83202782e904 rsi: ffff832027823000 rdi: ffff832027823000 [2019-07-05 00:37:16 UTC] (XEN) [24907.706560] rbp: ffff83403cab7d20 rsp: ffff83403cab7d00 r8: 0000000000000000 [2019-07-05 00:37:16 UTC] (XEN) [24907.743258] r9: 0000000000000000 r10: 0200200200200200 r11: 0100100100100100 [2019-07-05 00:37:16 UTC] (XEN) [24907.779940] r12: ffff832027823000 r13: ffff832027823000 r14: ffff83202782e7b0 [2019-07-05 00:37:16 UTC] (XEN) [24907.816849] r15: ffff83202782e880 cr0: 000000008005003b cr4: 00000000000426e0 [2019-07-05 00:37:16 UTC] (XEN) [24907.854125] cr3: 00000000bd8a1000 cr2: 000000001851b798 [2019-07-05 00:37:16 UTC] (XEN) [24907.881483] fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 [2019-07-05 00:37:16 UTC] (XEN) [24907.918309] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 [2019-07-05 00:37:16 UTC] (XEN) [24907.952619] Xen code around <ffff82d0802406fc> (sched_context_switched+0xaf/0x101): [2019-07-05 00:37:16 UTC] (XEN) [24907.990277] 00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8 [2019-07-05 00:37:16 UTC] (XEN) [24908.032393] Xen stack trace from rsp=ffff83403cab7d00: [2019-07-05 00:37:16 UTC] (XEN) [24908.061298] ffff832027823000 ffff832027823000 0000000000000000 ffff83202782e880 [2019-07-05 00:37:16 UTC] (XEN) [24908.098529] ffff83403cab7d60 ffff82d0802407c0 0000000000000082 ffff83202782e7c8 [2019-07-05 00:37:16 UTC] (XEN) [24908.135622] 000000000000001e ffff83202782e7c8 000000000000001e ffff82d080602628 [2019-07-05 00:37:16 UTC] (XEN) [24908.172671] ffff83403cab7dc0 ffff82d080240d83 000000000000df99 000000000000001e [2019-07-05 00:37:16 UTC] (XEN) [24908.210212] ffff832027823000 000016a62dc8c6bc 000000fc00000000 000000000000001e [2019-07-05 00:37:16 UTC] (XEN) [24908.247181] ffff83202782e7c8 ffff82d080602628 ffff82d0805da460 000000000000001e [2019-07-05 00:37:16 UTC] (XEN) [24908.284279] ffff83403cab7e60 ffff82d080240ea4 00000002802aecc5 ffff832027823000 [2019-07-05 00:37:16 UTC] (XEN) [24908.321128] ffff83202782e7b0 ffff83202782e880 ffff83403cab7e10 ffff82d080273b4e [2019-07-05 00:37:16 UTC] (XEN) [24908.358308] ffff83403cab7e10 ffff82d080242f7f ffff83403cab7e60 ffff82d08024663a [2019-07-05 00:37:17 UTC] (XEN) [24908.395662] ffff83403cab7ea0 ffff82d0802ec32a ffff8340000000ff ffff82d0805bc880 [2019-07-05 00:37:17 UTC] (XEN) [24908.432376] ffff82d0805bb980 ffffffffffffffff ffff83403cab7fff 000000000000001e [2019-07-05 00:37:17 UTC] (XEN) [24908.469812] ffff83403cab7e90 ffff82d080242575 0000000000000f00 ffff82d0805bb980 [2019-07-05 00:37:17 UTC] (XEN) [24908.508373] 000000000000001e ffff82d0806026f0 ffff83403cab7ea0 ffff82d0802425ca [2019-07-05 00:37:17 UTC] (XEN) [24908.549856] ffff83403cab7ef0 ffff82d08027a601 ffff82d080242575 0000001e7ffde000 [2019-07-05 00:37:17 UTC] (XEN) [24908.588022] ffff832027823000 ffff832027823000 ffff83127ffde000 ffff83203ffe5000 [2019-07-05 00:37:17 UTC] (XEN) [24908.625217] 000000000000001e ffff831204092000 ffff83403cab7d78 00000000ffffffed [2019-07-05 00:37:17 UTC] (XEN) [24908.662932] ffffffff81800000 0000000000000000 ffffffff81800000 0000000000000000 [2019-07-05 00:37:17 UTC] (XEN) [24908.703246] ffffffff818f4580 ffff880039118848 00000e6a3c4b2698 00000000148900db [2019-07-05 00:37:17 UTC] (XEN) [24908.743671] 0000000000000000 ffffffff8101e650 ffffffff8185c3e0 0000000000000000 [2019-07-05 00:37:17 UTC] (XEN) [24908.781927] 0000000000000000 0000000000000000 0000beef0000beef ffffffff81054eb2 [2019-07-05 00:37:17 UTC] (XEN) [24908.820986] Xen call trace: [2019-07-05 00:37:17 UTC] (XEN) [24908.836789] [<ffff82d0802406fc>] sched_context_switched+0xaf/0x101 [2019-07-05 00:37:17 UTC] (XEN) [24908.869916] [<ffff82d0802407c0>] schedule.c#sched_context_switch+0x72/0x151 [2019-07-05 00:37:17 UTC] (XEN) [24908.907384] [<ffff82d080240d83>] schedule.c#sched_slave+0x2a3/0x2b2 [2019-07-05 00:37:17 UTC] (XEN) [24908.941241] [<ffff82d080240ea4>] schedule.c#schedule+0x112/0x2a1 [2019-07-05 00:37:17 UTC] (XEN) [24908.973939] [<ffff82d080242575>] softirq.c#__do_softirq+0x85/0x90 [2019-07-05 00:37:17 UTC] (XEN) [24909.007101] [<ffff82d0802425ca>] do_softirq+0x13/0x15 [2019-07-05 00:37:17 UTC] (XEN) [24909.035971] [<ffff82d08027a601>] domain.c#idle_loop+0xad/0xc0 [2019-07-05 00:37:17 UTC] (XEN) [24909.070546] [2019-07-05 00:37:17 UTC] (XEN) [24909.080286] CPU0 @ e008:ffff82d0802431ba (stop_machine.c#stopmachine_wait_state+0x1a/0x24) [2019-07-05 00:37:17 UTC] (XEN) [24909.122896] CPU1 @ e008:ffff82d0802406f8 (sched_context_switched+0xab/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.159518] CPU3 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.199607] CPU2 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.235773] CPU5 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.276039] CPU4 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.312371] CPU7 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.352930] CPU6 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.388928] CPU8 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.424664] CPU9 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.465376] CPU10 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.507449] CPU11 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.544703] CPU13 @ e008:ffff82d0802431f2 (stop_machine.c#stopmachine_action+0x2e/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.588884] CPU12 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.625781] CPU15 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.666649] CPU14 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.703396] CPU17 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.744089] CPU16 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.781117] CPU23 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.821692] CPU22 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:18 UTC] (XEN) [24909.858139] CPU27 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0) [2019-07-05 00:37:18 UTC] (XEN) [24909.898704] CPU26 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:19 UTC] (XEN) [24909.936069] CPU19 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:19 UTC] (XEN) [24909.977291] CPU18 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:19 UTC] (XEN) [24910.014078] CPU31 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:19 UTC] (XEN) [24910.055692] CPU21 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:19 UTC] (XEN) [24910.100486] CPU24 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101) [2019-07-05 00:37:19 UTC] (XEN) [24910.136824] CPU25 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0) [2019-07-05 00:37:19 UTC] (XEN) [24910.177529] CPU29 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0) [2019-07-05 00:37:19 UTC] (XEN) [24910.218420] CPU28 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101) [2019-07-05 00:37:19 UTC] (XEN) [24910.255219] CPU20 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101) [2019-07-05 00:37:19 UTC] (XEN) [24910.292152] [2019-07-05 00:37:19 UTC] (XEN) [24910.301667] **************************************** [2019-07-05 00:37:19 UTC] (XEN) [24910.327892] Panic on CPU 30: [2019-07-05 00:37:19 UTC] (XEN) [24910.344165] FATAL TRAP: vector = 2 (nmi) [2019-07-05 00:37:19 UTC] (XEN) [24910.365476] [error_code=0000] [2019-07-05 00:37:19 UTC] (XEN) [24910.382509] **************************************** [2019-07-05 00:37:19 UTC] (XEN) [24910.408547] [2019-07-05 00:37:19 UTC] (XEN) [24910.418129] Reboot in five seconds...On a closer look, the second crash happens when you try to shutdown the host ("poweroff" in my case). And that was just another bug: the scheduler is still active when trying to enter ACPI deep sleep states. As non-boot cpus are being taken down via tasklets this will result in syncing problems when one cpu of a sched_resource is down already and the other is waiting for it to finish scheduling... Replacing the common scheuling softirq handler with one doing only tasklet scheduling in that case makes it work again. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |