[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen master] sched: reorganize cpu_disable_scheduler()



commit 4862065a2df3849456e3fdebb480ced038f1c2c1
Author:     Dario Faggioli <dario.faggioli@xxxxxxxxxx>
AuthorDate: Fri Jul 24 11:26:34 2015 +0200
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Fri Jul 24 11:26:34 2015 +0200

    sched: reorganize cpu_disable_scheduler()
    
    The function is called both when we want to remove a cpu
    from a cpupool, and during cpu teardown, for suspend or
    shutdown. If, however, the boot cpu (cpu 0, most of the
    times) is not present in the default cpupool, during
    suspend or shutdown, Xen crashes like this:
    
      root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
      root@Zhaman:~# shutdown -h now
      (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
      ...
      (XEN) Xen call trace:
      (XEN)    [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f
      (XEN)    [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10
      (XEN)    [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321
      (XEN)    [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac
      (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
      (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
      (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
      (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
      (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
      (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
      (XEN)
      (XEN)
      (XEN) ****************************************
      (XEN) Panic on CPU 15:
      (XEN) Assertion 'cpu < nr_cpu_ids' failed at 
...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
      (XEN) ****************************************
    
    There also are problems when we try to suspend or shutdown
    with a cpupool configured with just one cpu (no matter, in
    this case, whether that is the boot cpu or not):
    
      root@Zhaman:~# xl create /etc/xen/test.cfg
      root@Zhaman:~# xl cpupool-migrate test Pool-1
      root@Zhaman:~# xl cpupool-list -c
      Name               CPU list
      Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
      Pool-1             12
      root@Zhaman:~# shutdown -h now
      (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
      (XEN) CPU:    12
      ...
      (XEN) Xen call trace:
      (XEN)    [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
      (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
      (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
      (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
      (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
      (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
      (XEN)
      (XEN)
      (XEN) ****************************************
      (XEN) Panic on CPU 12:
      (XEN) Xen BUG at smpboot.c:895
      (XEN) ****************************************
    
    In both cases, the problem is the scheduler not being able
    to:
     - move all the vcpus to the boot cpu (as the boot cpu is
       not in the cpupool), in the former;
     - move the vcpus away from a cpu at all (as that is the
       only one cpu in the cpupool), in the latter.
    
    Solution is to distinguish, inside cpu_disable_scheduler(),
    the two cases of cpupool manipulation and teardown. For
    cpupool manipulation, it is correct to ask the scheduler to
    take an action, as pathological situation (like there not
    being any cpu in the pool where to send vcpus) are taken
    care of (i.e., forbidden!) already. For suspend and shutdown,
    we don't want the scheduler to be involved at all, as the
    final goal is pretty simple: "send all the vcpus to the
    boot cpu ASAP", so we just go for it.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
    Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
    Release-acked-by: Wei Liu <wei.liu2@xxxxxxxxxx>
---
 xen/common/schedule.c |  104 +++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 88 insertions(+), 16 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index df8c1d0..df64268 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -454,9 +454,10 @@ void vcpu_unblock(struct vcpu *v)
  * Do the actual movement of a vcpu from old to new CPU. Locks for *both*
  * CPUs needs to have been taken already when calling this!
  */
-static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
-                      unsigned int new_cpu)
+static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 {
+    unsigned int old_cpu = v->processor;
+
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
      * once the switch occurs, v->is_urgent is no longer protected by
@@ -478,6 +479,33 @@ static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
         v->processor = new_cpu;
 }
 
+/*
+ * Move a vcpu from its current processor to a target new processor,
+ * without asking the scheduler to do any placement. This is intended
+ * for being called from special contexts, where things are quiet
+ * enough that no contention is supposed to happen (i.e., during
+ * shutdown or software suspend, like ACPI S3).
+ */
+static void vcpu_move_nosched(struct vcpu *v, unsigned int new_cpu)
+{
+    unsigned long flags;
+    spinlock_t *lock, *new_lock;
+
+    ASSERT(system_state == SYS_STATE_suspend);
+    ASSERT(!vcpu_runnable(v) && (atomic_read(&v->pause_count) ||
+                                 atomic_read(&v->domain->pause_count)));
+
+    lock = per_cpu(schedule_data, v->processor).schedule_lock;
+    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
+
+    sched_spin_lock_double(lock, new_lock, &flags);
+    ASSERT(new_cpu != v->processor);
+    vcpu_move_locked(v, new_cpu);
+    sched_spin_unlock_double(lock, new_lock, flags);
+
+    sched_move_irqs(v);
+}
+
 static void vcpu_migrate(struct vcpu *v)
 {
     unsigned long flags;
@@ -542,7 +570,7 @@ static void vcpu_migrate(struct vcpu *v)
         return;
     }
 
-    vcpu_move(v, old_cpu, new_cpu);
+    vcpu_move_locked(v, new_cpu);
 
     sched_spin_unlock_double(old_lock, new_lock, flags);
 
@@ -615,7 +643,8 @@ int cpu_disable_scheduler(unsigned int cpu)
     struct vcpu *v;
     struct cpupool *c;
     cpumask_t online_affinity;
-    int    ret = 0;
+    unsigned int new_cpu;
+    int ret = 0;
 
     c = per_cpu(cpupool, cpu);
     if ( c == NULL )
@@ -644,25 +673,68 @@ int cpu_disable_scheduler(unsigned int cpu)
                 cpumask_setall(v->cpu_hard_affinity);
             }
 
-            if ( v->processor == cpu )
+            if ( v->processor != cpu )
             {
-                set_bit(_VPF_migrating, &v->pause_flags);
+                /* The vcpu is not on this cpu, so we can move on. */
                 vcpu_schedule_unlock_irqrestore(lock, flags, v);
-                vcpu_sleep_nosync(v);
-                vcpu_migrate(v);
+                continue;
+            }
+
+            /* If it is on this cpu, we must send it away. */
+            if ( unlikely(system_state == SYS_STATE_suspend) )
+            {
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);
+
+                /*
+                 * If we are doing a shutdown/suspend, it is not necessary to
+                 * ask the scheduler to chime in. In fact:
+                 *  * there is no reason for it: the end result we are after
+                 *    is just 'all the vcpus on the boot pcpu, and no vcpu
+                 *    anywhere else', so let's just go for it;
+                 *  * it's wrong, for cpupools with only non-boot pcpus, as
+                 *    the scheduler would always fail to send the vcpus away
+                 *    from the last online (non boot) pcpu!
+                 *
+                 * Therefore, in the shutdown/suspend case, we just pick up
+                 * one (still) online pcpu. Note that, at this stage, all
+                 * domains (including dom0) have been paused already, so we
+                 * do not expect any vcpu activity at all.
+                 */
+                cpumask_andnot(&online_affinity, &cpu_online_map,
+                               cpumask_of(cpu));
+                BUG_ON(cpumask_empty(&online_affinity));
+                /*
+                 * As boot cpu is, usually, pcpu #0, using cpumask_first()
+                 * will make us converge quicker.
+                 */
+                new_cpu = cpumask_first(&online_affinity);
+                vcpu_move_nosched(v, new_cpu);
             }
             else
+            {
+                /*
+                 * OTOH, if the system is still live, and we are here because
+                 * we are doing some cpupool manipulations:
+                 *  * we want to call the scheduler, and let it re-evaluation
+                 *    the placement of the vcpu, taking into account the new
+                 *    cpupool configuration;
+                 *  * the scheduler will always fine a suitable solution, or
+                 *    things would have failed before getting in here.
+                 */
+                set_bit(_VPF_migrating, &v->pause_flags);
                 vcpu_schedule_unlock_irqrestore(lock, flags, v);
+                vcpu_sleep_nosync(v);
+                vcpu_migrate(v);
 
-            /*
-             * A vcpu active in the hypervisor will not be migratable.
-             * The caller should try again after releasing and reaquiring
-             * all locks.
-             */
-            if ( v->processor == cpu )
-                ret = -EAGAIN;
+                /*
+                 * The only caveat, in this case, is that if a vcpu active in
+                 * the hypervisor isn't migratable. In this case, the caller
+                 * should try again after releasing and reaquiring all locks.
+                 */
+                if ( v->processor == cpu )
+                    ret = -EAGAIN;
+            }
         }
-
         domain_update_node_affinity(d);
     }
 
--
generated by git-patchbot for /home/xen/git/xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxx
http://lists.xensource.com/xen-changelog


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.