Xen project Mailing List

Re: [Xen-devel] [BUG] unable to shutdown (page fault in mwait_idle()/do_dbs_timer()/__find_next_bit()) (fwd)

From: Martin Cerveny <martin@xxxxxxxxx>

Date: Thu, 8 Mar 2018 11:26:08 +0100 (CET)

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 08 Mar 2018 10:44:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hello. Console output added (7x OK, 2x FAILED). Thanks, M.C> On Thu, 1 Mar 2018, Jan Beulich wrote:

On 09.01.18 at 23:51, <martin@xxxxxxxxx> wrote:

Hello.


I'm sorry for taking so long to get back to this.

On Tue, 9 Jan 2018, Jan Beulich wrote:

On 08.01.18 at 17:07, <martin@xxxxxxxxx> wrote:

On Mon, 8 Jan 2018, Jan Beulich wrote:

On 07.01.18 at 13:34, <martin@xxxxxxxxx> wrote:

(XEN) ----[ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]----


The -vgpu tag makes me wonder whether you have any patches in
your tree on top of plain 4.10.0 (or 4.10-staging). Also the debug=n
above ...


4.10.0 + 11 patches to make nvidia/vgpu work
(https://github.com/xenserver/xen-4.7.pg).
debug=n because xen's modified debug build process.

(XEN)    [<ffff82d08026ae60>] __find_next_bit+0x10/0x80
(XEN)    [<ffff82d080253180>] cpufreq_ondemand.c#do_dbs_timer+0x160/0x220
(XEN)    [<ffff82d0802c7c0e>] mwait-idle.c#mwait_idle+0x23e/0x340
(XEN)    [<ffff82d08026fa56>] domain.c#idle_loop+0x86/0xc0


... makes this call trace unreliable. But even with a reliable call
trace, analysis of the crash would be helped if you made
available the xen-syms (or xen.efi, depending on how you boot)
somewhere.


xen-syms - http://www.uschovna.cz/en/zasilka/UDP5LVE2679CGBIS-4YV/


Thanks. Looks to be a race between a timer in the governor and
the CPUs being brought down. In general the governor is supposed
to be disabled in the course of CPUs being brought down, so first
of all I wonder whether you're having some daemon in use which
sends management requests to the CPUfreq driver in Xen. Such a
daemon should of course be disabled by the system shutdown
scripts. Otherwise please try the attached debugging patch -
maybe we can see something from its output.


I suppose there should no be running anything because Dom0 kernel already
ended (see last two messages from dom0 kernel). Or how to check it ?

Patch added.
- no "dbs:" in output (grep "dbs:" ...)


Okay, this sufficiently proves that nothing tries to re-enable a
governor after they were brought down for the CPUs.

- exaples of shutdown output (1* OK + 2* fail):

-----------------------------------------------------

[  632.439402] ACPI: Preparing to enter system sleep state S5
[  632.486728] reboot: Power down
(XEN) Preparing system for ACPI S5 state.
(XEN) Disabling non-boot CPUs ...
(XEN) cpufreq: del CPU1 (1,ffaaab,1,2)
(XEN) Broke affinity for irq 140
(XEN) cpufreq: del CPU2 (1,4,1,4)
(XEN) Broke affinity for irq 139
(XEN) cpufreq: del CPU3 (1,ffaaa9,1,8)
(XEN) Broke affinity for irq 83
(XEN) cpufreq: del CPU4 (1,10,1,10)
(XEN) Broke affinity for irq 137
(XEN) cpufreq: del CPU5 (1,ffaaa1,1,20)
(XEN) cpufreq: del CPU6 (1,40,1,40)
(XEN) Broke affinity for irq 141
(XEN) cpufreq: del CPU7 (1,ffaa81,1,80)
(XEN) cpufreq: del CPU8 (1,100,1,100)
(XEN) cpufreq: del CPU9 (1,ffaa01,1,200)
(XEN) cpufreq: del CPU10 (1,400,1,400)
(XEN) cpufreq: del CPU11 (1,ffa801,1,800)
(XEN) cpufreq: del CPU12 (1,1000,1,1000)
(XEN) cpufreq: del CPU13 (1,ffa001,1,2000)
(XEN) cpufreq: del CPU14 (1,4000,1,4000)
(XEN) cpufreq: del CPU15 (1,ff8001,1,8000)
(XEN) cpufreq: del CPU16 (1,ff0001,1,10000)
(XEN) cpufreq: del CPU17 (1,fe0001,1,20000)
(XEN) cpufreq: del CPU18 (1,fc0001,1,40000)
(XEN) cpufreq: del CPU19 (1,f80001,1,80000)
(XEN) cpufreq: del CPU20 (1,f00001,1,100000)
(XEN) cpufreq: del CPU21 (1,e00001,1,200000)
(XEN) cpufreq: del CPU22 (1,c00001,1,400000)
(XEN) cpufreq: del CPU23 (1,800001,1,800000)
(XEN) Broke affinity for irq 72
(XEN) cpufreq: del CPU0 (1,1,1,1)
(XEN) Entering ACPI S5 state.


So this, despite being the working example, already shows a very
odd arrangement: All odd numbered CPUs _and_ all even numbered
ones from 16 up share a domain. The typical case would be for all
sibling threads and/or cores to share a domain, iirc.

(XEN) cpufreq: del CPU23 (1,800001,1,800000)
(XEN) ----[ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    23


So at the time the debug message gets printed, policy->cpus is
still valid. With hw_all set, next thing is the call
__cpufreq_governor(policy, CPUFREQ_GOV_STOP), which
ought to lead to cpufreq_governor_dbs()'s
"case CPUFREQ_GOV_STOP", which in turn calls dbs_timer_exit().
That function kills the timer that causes the crash. kill_timer(), in
turn, makes sure the timer being killed does not currently have
its handler executed on any CPU. IOW I can't spot any race, yet
there quite obviously is one in your case. All I can do at the
moment is hand you an extended debugging patch.

Jan

--- unstable.orig/xen/drivers/cpufreq/cpufreq.c 2017-09-12 12:39:58.310556379 
+0200
+++ unstable/xen/drivers/cpufreq/cpufreq.c      2018-03-01 16:26:55.071367162 
+0100
@@ -190,6 +190,7 @@ int cpufreq_add_cpu(unsigned int cpu)
            break;
        }
    }
+printk("CPU%u: dom%u (%u,%u)\n", cpu, dom, domexist, hw_all);//temp

    if (!domexist) {
        cpufreq_dom = xzalloc(struct cpufreq_dom);
@@ -352,6 +353,8 @@ int cpufreq_del_cpu(unsigned int cpu)

    /* for HW_ALL, stop gov for each core of the _PSD domain */
    /* for SW_ALL & SW_ANY, stop gov for the 1st core of the _PSD domain */
+printk("cpufreq: del CPU%u (%u,%lx,%lu,%lx)\n", cpu,//temp
+       hw_all, cpufreq_dom->map->bits[0], perf->domain_info.num_processors, 
policy->cpus->bits[0]);//temp
    if (hw_all || (cpumask_weight(cpufreq_dom->map) ==
                   perf->domain_info.num_processors))
        __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
--- unstable.orig/xen/drivers/cpufreq/cpufreq_ondemand.c        2017-09-12 
12:39:58.310556379 +0200
+++ unstable/xen/drivers/cpufreq/cpufreq_ondemand.c     2018-03-01 
16:21:41.829296179 +0100
@@ -179,6 +179,9 @@ static void do_dbs_timer(void *dbs)
    if (!dbs_info->enable)
        return;

+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: check CPU%u\n", dbs_info->cpu);
+}
    dbs_check_cpu(dbs_info);

    set_timer(&per_cpu(dbs_timer, dbs_info->cpu),
@@ -218,6 +221,9 @@ int cpufreq_governor_dbs(struct cpufreq_

    switch (event) {
    case CPUFREQ_GOV_START:
+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: start CPU%u [%pS]\n", cpu, __builtin_return_address(0));
+}
        if ((!cpu_online(cpu)) || (!policy->cur))
            return -EINVAL;

@@ -273,12 +279,18 @@ int cpufreq_governor_dbs(struct cpufreq_
        break;

    case CPUFREQ_GOV_STOP:
+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: stop CPU%u (%u,%d)\n", cpu, this_dbs_info->cpu, 
this_dbs_info->enable);
+}
        if ( !this_dbs_info->enable )
            /* Already not enabled */
            break;

        dbs_timer_exit(this_dbs_info);
        dbs_enable--;
+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: stopped CPU%u (%d,%u)\n", cpu, per_cpu(dbs_timer, cpu).status, 
per_cpu(dbs_timer, cpu).cpu);
+}

        break;

Attachment: debug_output_20180308.zip
Description: Zip archive

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.