[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] unable to shutdown (page fault in mwait_idle()/do_dbs_timer()/__find_next_bit()) (fwd)



Hello.

On Tue, 9 Jan 2018, Jan Beulich wrote:
On 08.01.18 at 17:07, <martin@xxxxxxxxx> wrote:
On Mon, 8 Jan 2018, Jan Beulich wrote:
On 07.01.18 at 13:34, <martin@xxxxxxxxx> wrote:
(XEN) ----[ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]----

The -vgpu tag makes me wonder whether you have any patches in
your tree on top of plain 4.10.0 (or 4.10-staging). Also the debug=n
above ...

4.10.0 + 11 patches to make nvidia/vgpu work
(https://github.com/xenserver/xen-4.7.pg).
debug=n because xen's modified debug build process.

(XEN)    [<ffff82d08026ae60>] __find_next_bit+0x10/0x80
(XEN)    [<ffff82d080253180>] cpufreq_ondemand.c#do_dbs_timer+0x160/0x220
(XEN)    [<ffff82d0802c7c0e>] mwait-idle.c#mwait_idle+0x23e/0x340
(XEN)    [<ffff82d08026fa56>] domain.c#idle_loop+0x86/0xc0

... makes this call trace unreliable. But even with a reliable call
trace, analysis of the crash would be helped if you made
available the xen-syms (or xen.efi, depending on how you boot)
somewhere.

xen-syms - http://www.uschovna.cz/en/zasilka/UDP5LVE2679CGBIS-4YV/

Thanks. Looks to be a race between a timer in the governor and
the CPUs being brought down. In general the governor is supposed
to be disabled in the course of CPUs being brought down, so first
of all I wonder whether you're having some daemon in use which
sends management requests to the CPUfreq driver in Xen. Such a
daemon should of course be disabled by the system shutdown
scripts. Otherwise please try the attached debugging patch -
maybe we can see something from its output.

I suppose there should no be running anything because Dom0 kernel already ended (see last two messages from dom0 kernel). Or how to check it ?

Patch added.
- no "dbs:" in output (grep "dbs:" ...)
- exaples of shutdown output (1* OK + 2* fail):

-----------------------------------------------------

[  632.439402] ACPI: Preparing to enter system sleep state S5
[  632.486728] reboot: Power down
(XEN) Preparing system for ACPI S5 state.
(XEN) Disabling non-boot CPUs ...
(XEN) cpufreq: del CPU1 (1,ffaaab,1,2)
(XEN) Broke affinity for irq 140
(XEN) cpufreq: del CPU2 (1,4,1,4)
(XEN) Broke affinity for irq 139
(XEN) cpufreq: del CPU3 (1,ffaaa9,1,8)
(XEN) Broke affinity for irq 83
(XEN) cpufreq: del CPU4 (1,10,1,10)
(XEN) Broke affinity for irq 137
(XEN) cpufreq: del CPU5 (1,ffaaa1,1,20)
(XEN) cpufreq: del CPU6 (1,40,1,40)
(XEN) Broke affinity for irq 141
(XEN) cpufreq: del CPU7 (1,ffaa81,1,80)
(XEN) cpufreq: del CPU8 (1,100,1,100)
(XEN) cpufreq: del CPU9 (1,ffaa01,1,200)
(XEN) cpufreq: del CPU10 (1,400,1,400)
(XEN) cpufreq: del CPU11 (1,ffa801,1,800)
(XEN) cpufreq: del CPU12 (1,1000,1,1000)
(XEN) cpufreq: del CPU13 (1,ffa001,1,2000)
(XEN) cpufreq: del CPU14 (1,4000,1,4000)
(XEN) cpufreq: del CPU15 (1,ff8001,1,8000)
(XEN) cpufreq: del CPU16 (1,ff0001,1,10000)
(XEN) cpufreq: del CPU17 (1,fe0001,1,20000)
(XEN) cpufreq: del CPU18 (1,fc0001,1,40000)
(XEN) cpufreq: del CPU19 (1,f80001,1,80000)
(XEN) cpufreq: del CPU20 (1,f00001,1,100000)
(XEN) cpufreq: del CPU21 (1,e00001,1,200000)
(XEN) cpufreq: del CPU22 (1,c00001,1,400000)
(XEN) cpufreq: del CPU23 (1,800001,1,800000)
(XEN) Broke affinity for irq 72
(XEN) cpufreq: del CPU0 (1,1,1,1)
(XEN) Entering ACPI S5 state.

-----------------------------------------------------------

[  669.171396] ACPI: Preparing to enter system sleep state S5
[  669.218637] reboot: Power down
(XEN) Preparing system for ACPI S5 state.
(XEN) Disabling non-boot CPUs ...
(XEN) cpufreq: del CPU1 (1,ffaaab,1,2)
(XEN) Broke affinity for irq 138
(XEN) cpufreq: del CPU2 (1,4,1,4)
(XEN) Broke affinity for irq 141
(XEN) cpufreq: del CPU3 (1,ffaaa9,1,8)
(XEN) cpufreq: del CPU4 (1,10,1,10)
(XEN) cpufreq: del CPU5 (1,ffaaa1,1,20)
(XEN) Broke affinity for irq 140
(XEN) cpufreq: del CPU6 (1,40,1,40)
(XEN) Broke affinity for irq 139
(XEN) cpufreq: del CPU7 (1,ffaa81,1,80)
(XEN) Broke affinity for irq 137
(XEN) cpufreq: del CPU8 (1,100,1,100)
(XEN) cpufreq: del CPU9 (1,ffaa01,1,200)
(XEN) cpufreq: del CPU10 (1,400,1,400)
(XEN) cpufreq: del CPU11 (1,ffa801,1,800)
(XEN) cpufreq: del CPU12 (1,1000,1,1000)
(XEN) cpufreq: del CPU13 (1,ffa001,1,2000)
(XEN) cpufreq: del CPU14 (1,4000,1,4000)
(XEN) cpufreq: del CPU15 (1,ff8001,1,8000)
(XEN) cpufreq: del CPU16 (1,ff0001,1,10000)
(XEN) cpufreq: del CPU17 (1,fe0001,1,20000)
(XEN) cpufreq: del CPU18 (1,fc0001,1,40000)
(XEN) cpufreq: del CPU19 (1,f80001,1,80000)
(XEN) cpufreq: del CPU20 (1,f00001,1,100000)
(XEN) cpufreq: del CPU21 (1,e00001,1,200000)
(XEN) cpufreq: del CPU22 (1,c00001,1,400000)
(XEN) cpufreq: del CPU23 (1,800001,1,800000)
(XEN) ----[ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    23
(XEN) RIP:    e008:[<ffff82d08026aed0>] __find_next_bit+0x10/0x80
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830879db0400   rcx: 0000000000000018
(XEN) rdx: 0000000000000018   rsi: 0000000000000018   rdi: 0000000000000000
(XEN) rbp: 00000000061c6652   rsp: ffff83104eaafdd8   r8:  0000000000000018
(XEN) r9:  ffff830879db6d70   r10: ffff830879db28e8   r11: 0000009df890a1e7
(XEN) r12: 0000000000000000   r13: ffff8308788cef80   r14: ffff82d0805614e0
(XEN) r15: 0000000000000017   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 000000007da2f000   cr2: 0000000000000000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08026aed0> (__find_next_bit+0x10/0x80):
(XEN)  e1 3f 48 8d 3c c7 74 25 <4c> 8b 0f 41 b8 40 00 00 00 41 29 c8 49 d3 e9 49
(XEN) Xen stack trace from rsp=ffff83104eaafdd8:
(XEN)    ffff82d0802531f0 0000000000000017 ffff830800000018 ffff82d080577380
(XEN)    00200f0879db6d98 0000009dd4bdccf5 0000000000000004 ffff830879db6e40
(XEN)    ffff82d08054ac80 0000009dd4bdccf5 0000000000000017 0000000000000017
(XEN)    ffff82d0802c7c7e 0000000000000d43 0000009dcec82f1b ffff830879db6ef8
(XEN)    0000002000000008 000001cf00000390 0000000000000000 0000000000000000
(XEN)    0000001900000001 ffff82e028c4b300 ffff82000007ffff ffff82d080552c80
(XEN)    ffff82d08054b800 ffff82d0805771f0 0000000000000017 0000000000000017
(XEN)    ffff82d0805614e0 ffff82d080420e80 ffff82d08026fac6 0000000000000000
(XEN)    ffff83104eaaffff ffff83007ddf1000 ffff83007ddf1000 ffff83007ddf1000
(XEN)    ffff830879db0180 ffff830879db0188 0000009dcec71067 ffff82d0805614e0
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000017 ffff83007ddf1000 00000037f9839080
(XEN)    00000000001526e0
(XEN) Xen call trace:
(XEN)    [<ffff82d08026aed0>] __find_next_bit+0x10/0x80
(XEN)    [<ffff82d0802531f0>] cpufreq_ondemand.c#do_dbs_timer+0x160/0x220
(XEN)    [<ffff82d0802c7c7e>] mwait-idle.c#mwait_idle+0x23e/0x340
(XEN)    [<ffff82d08026fac6>] domain.c#idle_loop+0x86/0xc0
(XEN)
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 000000087ffeb063 ffffffffffffffff
(XEN)  L3[0x000] = 000000087ffea063 ffffffffffffffff
(XEN)  L2[0x000] = 000000087ffe9063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 23:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000000
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

-------------------------------------------------------------

[  305.965633] ACPI: Preparing to enter system sleep state S5^M
[  306.012876] reboot: Power down^M
(XEN) Preparing system for ACPI S5 state.
(XEN) Disabling non-boot CPUs ...
(XEN) cpufreq: del CPU1 (1,ffaaab,1,2)
(XEN) Broke affinity for irq 83
(XEN) cpufreq: del CPU2 (1,4,1,4)
(XEN) Broke affinity for irq 138
(XEN) cpufreq: del CPU3 (1,ffaaa9,1,8)
(XEN) Broke affinity for irq 137
(XEN) cpufreq: del CPU4 (1,10,1,10)
(XEN) cpufreq: del CPU5 (1,ffaaa1,1,20)
(XEN) Broke affinity for irq 140
(XEN) cpufreq: del CPU6 (1,40,1,40)
(XEN) Broke affinity for irq 139
(XEN) cpufreq: del CPU7 (1,ffaa81,1,80)
(XEN) cpufreq: del CPU8 (1,100,1,100)
(XEN) cpufreq: del CPU9 (1,ffaa01,1,200)
(XEN) cpufreq: del CPU10 (1,400,1,400)
(XEN) cpufreq: del CPU11 (1,ffa801,1,800)
(XEN) cpufreq: del CPU12 (1,1000,1,1000)
(XEN) cpufreq: del CPU13 (1,ffa001,1,2000)
(XEN) cpufreq: del CPU14 (1,4000,1,4000)
(XEN) cpufreq: del CPU15 (1,ff8001,1,8000)
(XEN) cpufreq: del CPU16 (1,ff0001,1,10000)
(XEN) cpufreq: del CPU17 (1,fe0001,1,20000)
(XEN) cpufreq: del CPU18 (1,fc0001,1,40000)
(XEN) cpufreq: del CPU19 (1,f80001,1,80000)
(XEN) cpufreq: del CPU20 (1,f00001,1,100000)
(XEN) ----[ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    20
(XEN) RIP:    e008:[<ffff82d08026aed0>] __find_next_bit+0x10/0x80
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830879dbc400   rcx: 0000000000000015
(XEN) rdx: 0000000000000015   rsi: 0000000000000018   rdi: 0000000000000000
(XEN) rbp: 00000000061bd8a8   rsp: ffff83104ead7dd8   r8:  0000000000000018
(XEN) r9:  ffff83104eaea670   r10: ffff83104eaeade8   r11: 0000004974cb66fb
(XEN) r12: 0000000000000000   r13: ffff8308788cfb20   r14: ffff82d0805614e0
(XEN) r15: 0000000000000014   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 000000007da2f000   cr2: 0000000000000000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08026aed0> (__find_next_bit+0x10/0x80):
(XEN)  e1 3f 48 8d 3c c7 74 25 <4c> 8b 0f 41 b8 40 00 00 00 41 29 c8 49 d3 e9 49
(XEN) Xen stack trace from rsp=ffff83104ead7dd8:
(XEN)    ffff82d0802531f0 0000000000000014 0000000000000018 ffff82d080577380
(XEN)    00200f084eaea698 00000049474d9c83 0000000000000004 ffff83104eaea960
(XEN)    ffff82d08054ac80 00000049474d9c83 0000000000000014 0000000000000014
(XEN)    ffff82d0802c7c7e ffff830879dbc300 000000494157ac5e ffff83104eaeaa18
(XEN)    0000002000000008 0000035f00000464 0000000000000000 0000000000000000
(XEN)    0000001900000001 ffff82e028c4b530 ffff82000007ffff ffff82d080552c80
(XEN)    ffff82d08054b680 ffff82d0805771f0 0000000000000014 0000000000000014
(XEN)    ffff82d0805614e0 ffff82d080420e80 ffff82d08026fac6 0000000000000000
(XEN)    ffff83104ead7fff ffff83007ddf4000 ffff83007ddf4000 ffff83007ddf4000
(XEN)    ffff830879dbc180 ffff830879dbc188 000000494156fa81 ffff82d0805614e0
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000014 ffff83007ddf4000 00000037f9845080
(XEN)    00000000001526e0
(XEN) Xen call trace:
(XEN)    [<ffff82d08026aed0>] __find_next_bit+0x10/0x80
(XEN)    [<ffff82d0802531f0>] cpufreq_ondemand.c#do_dbs_timer+0x160/0x220
(XEN)    [<ffff82d0802c7c7e>] mwait-idle.c#mwait_idle+0x23e/0x340
(XEN)    [<ffff82d08026fac6>] domain.c#idle_loop+0x86/0xc0
(XEN)
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 000000087ffeb063 ffffffffffffffff
(XEN)  L3[0x000] = 000000087ffea063 ffffffffffffffff
(XEN)  L2[0x000] = 000000087ffe9063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 20:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000000
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.