Re: [Xen-devel] [BUG] unable to shutdown (page fault in mwait_idle()/do_dbs_timer()/__find_next_bit()) (fwd)


On Mon, 8 Jan 2018, Jan Beulich wrote:

On 07.01.18 at 13:34, <martin@xxxxxxxxx> wrote:
(XEN) ----[ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]----

The -vgpu tag makes me wonder whether you have any patches in
your tree on top of plain 4.10.0 (or 4.10-staging). Also the debug=n
above ...

4.10.0 + 11 patches to make nvidia/vgpu work (https://github.com/xenserver/xen-4.7.pg).
debug=n because xen's modified debug build process.

(XEN)    [<ffff82d08026ae60>] __find_next_bit+0x10/0x80
(XEN)    [<ffff82d080253180>] cpufreq_ondemand.c#do_dbs_timer+0x160/0x220
(XEN)    [<ffff82d0802c7c0e>] mwait-idle.c#mwait_idle+0x23e/0x340
(XEN)    [<ffff82d08026fa56>] domain.c#idle_loop+0x86/0xc0

... makes this call trace unreliable. But even with a reliable call
trace, analysis of the crash would be helped if you made
available the xen-syms (or xen.efi, depending on how you boot)

xen-syms - http://www.uschovna.cz/en/zasilka/UDP5LVE2679CGBIS-4YV/

Finally, there being (as you say) a 10% probability of the crash -
have you been able to connect its occurrence to anything that
the system was doing prior to the shutdown/reboot attempt?

The same start/stop scripted from external system (IPMI+ssh).

Actually - is this a problem with shutdown _only_, or also with

Tested again - start xen, dom0, start HVM domain (fedora24+passthrough quadro P2000), shutdown HVM domain and poweroff or reboot.
5 fails of 10 tries during "poweroff" ((XEN) Preparing system for ACPI S5 
No fail of 10 tries with "reboot" ((XEN) Hardware Dom0 shutdown: rebooting 

Thanks, Martin

PS: I looked back in time and I did not find this problem in serial console log with Xen4.5.0, 4.5.2, 4.5.3, 4.7.0, 4.8.0 (other versions untested).

