Xen project Mailing List

Re: [Xen-devel] [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Wed, 8 Jul 2015 17:01:29 +0100

Cc: Juergen Gross <jgross@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 08 Jul 2015 16:01:35 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > The function is called both when we want to remove a cpu > from a cpupool, and during cpu teardown, for suspend or > shutdown. If, however, the boot cpu (cpu 0, most of the > times) is not present in the default cpupool, during > suspend or shutdown, Xen crashes like this: > > root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0 > root@Zhaman:~# shutdown -h now > (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- > ... > (XEN) Xen call trace: > (XEN) [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f > (XEN) [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10 > (XEN) [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321 > (XEN) [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac > (XEN) [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e > (XEN) [<ffff82d080101424>] take_cpu_down+0x34/0x3b > (XEN) [<ffff82d08013097a>] stopmachine_action+0x70/0x99 > (XEN) [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab > (XEN) [<ffff82d080132926>] do_tasklet+0x5e/0x8a > (XEN) [<ffff82d08016478c>] idle_loop+0x56/0x6b > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 15: > (XEN) Assertion 'cpu < nr_cpu_ids' failed at > ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97 > (XEN) **************************************** > > There also are problems when we try to suspend or shutdown > with a cpupool configured with just one cpu (no matter, in > this case, whether that is the boot cpu or not): > > root@Zhaman:~# xl create /etc/xen/test.cfg > root@Zhaman:~# xl cpupool-migrate test Pool-1 > root@Zhaman:~# xl cpupool-list -c > Name CPU list > Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,13,14,15 > Pool-1 12 > root@Zhaman:~# shutdown -h now > (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 12 > ... > (XEN) Xen call trace: > (XEN) [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e > (XEN) [<ffff82d080101424>] take_cpu_down+0x34/0x3b > (XEN) [<ffff82d08013097a>] stopmachine_action+0x70/0x99 > (XEN) [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab > (XEN) [<ffff82d080132926>] do_tasklet+0x5e/0x8a > (XEN) [<ffff82d08016478c>] idle_loop+0x56/0x6b > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 12: > (XEN) Xen BUG at smpboot.c:895 > (XEN) **************************************** > > In both cases, the problem is the scheduler not being able > to: > - move all the vcpus to the boot cpu (as the boot cpu is > not in the cpupool), in the former; > - move the vcpus away from a cpu at all (as that is the > only one cpu in the cpupool), in the latter. > > Solution is to distinguish, inside cpu_disable_scheduler(), > the two cases of cpupool manipulation and teardown. For > cpupool manipulation, it is correct to ask the scheduler to > take an action, as pathological situation (like there not > being any cpu in the pool where to send vcpus) are taken > care of (i.e., forbidden!) already. For suspend and shutdown, > we don't want the scheduler to be involved at all, as the > final goal is pretty simple: "send all the vcpus to the > boot cpu ASAP", so we just go for it. > > Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> > --- > Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx> > Cc: Juergen Gross <jgross@xxxxxxxx> > --- > xen/common/schedule.c | 109 > ++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 93 insertions(+), 16 deletions(-) > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index e83c666..eac8804 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v) > * Do the actual movemet of a vcpu from old to new CPU. Locks for *both* > * CPUs needs to have been taken already when calling this! > */ > -static void vcpu_move(struct vcpu *v, unsigned int old_cpu, > - unsigned int new_cpu) > +static void _vcpu_move(struct vcpu *v, unsigned int old_cpu, > + unsigned int new_cpu) > { > /* > * Transfer urgency status to new CPU before switching CPUs, as > @@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int > old_cpu, > v->processor = new_cpu; > } > > +static void vcpu_move(struct vcpu *v, unsigned int new_cpu) Not quite a fan of the naming scheme here. Perhaps vcpu_move and vcpu_move_locked? Actually -- looking at this again, is there a reason to pass old_cpu into _vcpu_move? It looks like old_vcpu should always be equal to v->processor. That would make the function prototypes the same. > +{ > + unsigned long flags; > + unsigned int cpu = v->processor; > + spinlock_t *lock, *new_lock; > + > + /* > + * When here, the vcpu should be ready for being moved. This means: > + * - both its original and target processor must be quiet; > + * - it must not be marked as currently running; > + * - the proper flag must be set (i.e., no one must have had any > + * chance to reset it). > + */ > + ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) && > + is_idle_vcpu(curr_on_cpu(new_cpu))); > + ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags)); > + > + lock = per_cpu(schedule_data, v->processor).schedule_lock; > + new_lock = per_cpu(schedule_data, new_cpu).schedule_lock; > + > + sched_spin_lock_double(lock, new_lock, &flags); > + ASSERT(new_cpu != v->processor); Don't we need to do the "schedule lock dance" here as well -- double-check to make sure that v->processor hasn't changed since the time we grabbed the lock? > + _vcpu_move(v, cpu, new_cpu); > + sched_spin_unlock_double(lock, new_lock, flags); > + > + sched_move_irqs(v); > + vcpu_wake(v); > +} > + > static void vcpu_migrate(struct vcpu *v) > { > unsigned long flags; > @@ -543,7 +572,7 @@ static void vcpu_migrate(struct vcpu *v) > return; > } > > - vcpu_move(v, old_cpu, new_cpu); > + _vcpu_move(v, old_cpu, new_cpu); > > sched_spin_unlock_double(old_lock, new_lock, flags); > > @@ -616,7 +645,8 @@ int cpu_disable_scheduler(unsigned int cpu) > struct vcpu *v; > struct cpupool *c; > cpumask_t online_affinity; > - int ret = 0; > + unsigned int new_cpu; > + int ret = 0; > > c = per_cpu(cpupool, cpu); > if ( c == NULL ) > @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu) > cpumask_setall(v->cpu_hard_affinity); > } > > - if ( v->processor == cpu ) > + if ( v->processor != cpu ) > { > - set_bit(_VPF_migrating, &v->pause_flags); > + /* This vcpu is not on cpu, so we can move on. */ > vcpu_schedule_unlock_irqrestore(lock, flags, v); > - vcpu_sleep_nosync(v); > - vcpu_migrate(v); > + continue; > } > - else > - vcpu_schedule_unlock_irqrestore(lock, flags, v); > > /* > - * A vcpu active in the hypervisor will not be migratable. > - * The caller should try again after releasing and reaquiring > - * all locks. > + * If we're here, it means that the vcpu is on cpu. Let's see how > + * it's best to send it away, depending on whether we are > shutting > + * down/suspending, or doing cpupool manipulations. > */ > - if ( v->processor == cpu ) > - ret = -EAGAIN; > - } > + set_bit(_VPF_migrating, &v->pause_flags); > + vcpu_schedule_unlock_irqrestore(lock, flags, v); > + vcpu_sleep_nosync(v); I don't quite understand this. By calling _nosync(), you're not guaranteed that the vcpu has actually stopped running when you call vcpu_move() down below; and yet inside vcpu_move(), you assert !v->is_running. So either at this point, when doing suspend, the vcpu has already been paused; in which case this is unnecessary; or it hasn't been paused, in which case we're potentially racing the IPI / deschedule, and will trip over the ASSERT if we "win". Did I miss something? (I'm perfectly ready to believe that I have...) -George > + > + /* > + * In case of shutdown/suspend, it is not necessary to ask the > + * scheduler to chime in. In fact: > + * * there is no reason for it: the end result we are after is > + * just 'all the vcpus on the boot pcpu, and no vcpu anywhere > + * else', so let's just go for it; > + * * it's wrong, when dealing a cpupool with only non-boot > pcpus, > + * as the scheduler will always fail to send the vcpus away > + * from the last online (non boot) pcpu! > + * > + * Therefore, in the shutdown/suspend case, let's just pick one > + * of the still online pcpus, and send everyone there. Ideally, > + * we would pick up the boot pcpu directly, but we don't know > + * which one it is. > + * > + * OTOH, if the system is still live, and we are here because of > + * cpupool manipulations: > + * * it would be best to call the scheduler, as that would serve > + * as a re-evaluation of the placement of the vcpu, taking > into > + * account the modified cpupool configuration; > + * * the scheduler will always fine a suitable solution, or > + * things would have failed before getting in here. > + * > + * Therefore, in the cpupool manipulation case, let's just ask > the > + * scheduler to do its job, via calling vcpu_migrate(). > + */ > + if ( unlikely(system_state == SYS_STATE_suspend) ) > + { > + /* > + * The boot pcpu is, usually, pcpu #0, so using > cpumask_first() > + * actually helps us to achieve our ultimate goal quicker. > + */ > + cpumask_andnot(&online_affinity, &cpu_online_map, > cpumask_of(cpu)); > + new_cpu = cpumask_first(&online_affinity); > + vcpu_move(v, new_cpu); > + } > + else > + { > + /* > + * The only caveat, in this case, is that if the vcpu active > + * in the hypervisor, it won't be migratable. In this case, > + * the caller should try again after releasing and reaquiring > + * all locks. > + */ > + vcpu_migrate(v); > > + if ( v->processor == cpu ) > + ret = -EAGAIN; > + } > + } > domain_update_node_affinity(d); > } > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.