[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] [SOLVED] unable to shutdown (page fault in mwait_idle()/do_dbs_timer()/__find_next_bit())



Hello.

It seems to be solved by your patch !

Probably example, that crashed previously:
(XEN) cpufreq: del CPU11 (1,ffa801,1,800)
(XEN) dbs: check CPU12
(XEN) dbs: stop CPU11 (11,1)
(XEN) dbs: check CPU11
(XEN) dbs: stopped CPU11 (2,65535)

I tested over night ~100x xen shutdown without crash (without patch every ~4x 
shutdown crashed).
Attached xen console output for all (sucessful) tests (also added 
"sync_console" to xen startup to handle serial output data loose).

Thanks, Martin

On Thu, 8 Mar 2018, Jan Beulich wrote:

On 08.03.18 at 11:26, <martin@xxxxxxxxx> wrote:
Console output added (7x OK, 2x FAILED).

Thanks. You've chopped off some of the messages though. I think
I've spotted the issue nevertheless - would you please give the
patch below a try?

Jan

cpufreq/ondemand: fix race while offlining CPU

Offlining a CPU involves stopping the cpufreq governor. The on-demand
governor will kill the timer before letting generic code proceed, but
since that generally isn't happening on the subject CPU,
cpufreq_dbs_timer_resume() may run in parallel. If that managed to
invoke the timer handler, that handler needs to run to completion before
dbs_timer_exit() may safely exit.

Make the "stoppable" field a tristate, changing it from +1 to -1 around
the timer function invocation, and make dbs_timer_exit() wait for it to
become non-negative (still writing zero if it's +1).

Also adjust coding style in cpufreq_dbs_timer_resume().

Reported-by: Martin Cerveny <martin@xxxxxxxxx>
Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

--- unstable.orig/xen/drivers/cpufreq/cpufreq_ondemand.c
+++ unstable/xen/drivers/cpufreq/cpufreq_ondemand.c
@@ -204,7 +204,14 @@ static void dbs_timer_init(struct cpu_db
static void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
{
    dbs_info->enable = 0;
-    dbs_info->stoppable = 0;
+
+    /*
+     * The timer function may be running (from cpufreq_dbs_timer_resume) -
+     * wait for it to complete.
+     */
+    while ( cmpxchg(&dbs_info->stoppable, 1, 0) < 0 )
+        cpu_relax();
+
    kill_timer(&per_cpu(dbs_timer, dbs_info->cpu));
}

@@ -369,23 +376,22 @@ void cpufreq_dbs_timer_suspend(void)

void cpufreq_dbs_timer_resume(void)
{
-    int cpu;
-    struct timer* t;
-    s_time_t now;
-
-    cpu = smp_processor_id();
+    unsigned int cpu = smp_processor_id();
+    int8_t *stoppable = &per_cpu(cpu_dbs_info, cpu).stoppable;

-    if ( per_cpu(cpu_dbs_info,cpu).stoppable )
+    if ( *stoppable )
    {
-        now = NOW();
-        t = &per_cpu(dbs_timer, cpu);
-        if (t->expires <= now)
+        s_time_t now = NOW();
+        struct timer *t = &per_cpu(dbs_timer, cpu);
+
+        if ( t->expires <= now )
        {
+            if ( !cmpxchg(stoppable, 1, -1) )
+                return;
            t->function(t->data);
+            (void)cmpxchg(stoppable, -1, 1);
        }
        else
-        {
-            set_timer(t, align_timer(now , dbs_tuners_ins.sampling_rate));
-        }
+            set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate));
    }
}
--- unstable.orig/xen/include/acpi/cpufreq/cpufreq.h
+++ unstable/xen/include/acpi/cpufreq/cpufreq.h
@@ -225,8 +225,8 @@ struct cpu_dbs_info_s {
    struct cpufreq_frequency_table *freq_table;
    int cpu;
    unsigned int enable:1;
-    unsigned int stoppable:1;
    unsigned int turbo_enabled:1;
+    int8_t stoppable;
};

int cpufreq_governor_dbs(struct cpufreq_policy *policy, unsigned int event);



Attachment: output_after_patch.zip
Description: Zip archive

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.