[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Fix scheduler crash after s3 resume
Am 23.01.2013 16:51, schrieb Tomasz Wroblewski: Hi all, This was also discussed earlier, for example here http://xen.markmail.org/thread/iqvkylp3mclmsnbw Changeset 25079:d5ccb2d1dbd1 (Introduce system_state variable) added a global variable, which, among other things, is used to prevent disabling cpu scheduler, prevent breaking vcpu affinities, prevent removing the cpu from cpupool on suspend. However, it missed one place where cpu is removed from the cpupool valid cpus mask, in smpboot.c, __cpu_disable(), line 840: cpumask_clear_cpu(cpu, cpupool0->cpu_valid); This causes the vcpu in the default pool to be considered inactive, and the following assertion is violated in sched_credit.c soon after resume transitions out of xen, causing a platform reboot: (XEN) Finishing wakeup from ACPI S3 state. (XEN) Enabling non-boot CPUs ... (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)' failed at sched_credit.c:507 (XEN) ----[ Xen-4.3-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 1 (XEN) RIP: e008:[<ffff82c480119e9e>] _csched_cpu_pick+0x155/0x5fd (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (XEN) rax: 0000000000000001 rbx: 0000000000000008 rcx: 0000000000000008 (XEN) rdx: 00000000000000ff rsi: 0000000000000008 rdi: 0000000000000000 (XEN) rbp: ffff83011415fdd8 rsp: ffff83011415fcf8 r8: 0000000000000000 (XEN) r9: 000000000000003e r10: 00000008f3de731f r11: ffffea0000063800 (XEN) r12: ffff82c480261720 r13: ffff830137b4d950 r14: ffff830137beb010 (XEN) r15: ffff82c480261720 cr0: 0000000080050033 cr4: 00000000000026f0 (XEN) cr3: 000000013c17d000 cr2: ffff8800ac6ef8f0 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff83011415fcf8: (XEN) 00000000000af257 0000000800000001 ffff8300ba4fd000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000002 ffff8800ac6ef8f0 (XEN) 0000000800000000 00000001318e0025 0000000000000087 ffff83011415fd68 (XEN) ffff82c480124f79 ffff83011415fd98 ffff83011415fda8 00007fda88d1e790 (XEN) ffff8800ac6ef8f0 00000001318e0025 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000146 ffff830137b4d940 (XEN) 0000000000000001 ffff830137b4d950 ffff830137beb010 ffff82c480261720 (XEN) ffff83011415fe48 ffff82c48011a51b 0002000e00000007 ffffffff81009071 (XEN) 000000000000e033 ffff83013a805360 ffff880002bb3c28 000000000000e02b (XEN) e4d87248e7ca5f52 ffff830102ae2200 0000000000000001 ffff82c48011a356 (XEN) 00000008efa1f543 00007fda88d1e790 ffff83011415fe78 ffff82c48012748f (XEN) 0000000000000002 ffff830137beb028 ffff830102ae2200 ffff830137beb8d0 (XEN) ffff83011415fec8 ffff82c48012758b ffff830114150000 ffff8800ac6ef8f0 (XEN) 80100000ae86d065 ffff82c4802e0080 ffff82c4802e0000 ffff830114158000 (XEN) ffffffffffffffff 00007fda88d1e790 ffff83011415fef8 ffff82c480124b4e (XEN) ffff8300ba4fd000 ffffea0000063800 00000001318e0025 ffff8800ac6ef8f0 (XEN) ffff83011415ff08 ffff82c480124bb4 00007cfeebea00c7 ffff82c480226a71 (XEN) 00007fda88d1e790 ffff8800ac6ef8f0 00000001318e0025 ffffea0000063800 (XEN) ffff880002bb3c78 00000001318e0025 ffffea0000063800 0000000000000146 (XEN) 00003ffffffff000 ffffea0002b1bbf0 0000000000000000 00000001318e0025 (XEN) Xen call trace: (XEN) [<ffff82c480119e9e>] _csched_cpu_pick+0x155/0x5fd (XEN) [<ffff82c48011a51b>] csched_tick+0x1c5/0x342 (XEN) [<ffff82c48012748f>] execute_timer+0x4e/0x6c (XEN) [<ffff82c48012758b>] timer_softirq_action+0xde/0x206 (XEN) [<ffff82c480124b4e>] __do_softirq+0x8e/0x99 (XEN) [<ffff82c480124bb4>] do_softirq+0x13/0x15 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion '!cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus)' failed at sched_credit.c:507 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... ^ reason for above being that "cpus" cpumask is empty as it is a logical "and" between cpupool's valid cpus (from which the cpu was removed) and cpu affinity mask. Attached patch follows the spirit of the changeset 25079:d5ccb2d1dbd1 (which blocked removal of the cpu from the cpupool in cpupool.c) by also blocking it's removal from the cpupool's valid cpumask. So cpu affinities are still preserved across suspend/resume, and scheuduler does not need to be disabled, as per original intent (I think). Would welcome comments. Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxxx> Acked-by: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> Commit message: Fix s3 resume regression (crash in scheduler) after c-s 25079:d5ccb2d1dbd1 by also blocking removal of the cpu from the cpupool's cpu_valid mask - in the spirit of mentioned c-s. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel -- Juergen Gross Principal Developer Operating Systems PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |