[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] crash in csched_load_balance after xl vcpu-pin
> On Apr 12, 2018, at 6:25 PM, Dario Faggioli <dfaggioli@xxxxxxxx> wrote: > > On Thu, 2018-04-12 at 17:38 +0200, Dario Faggioli wrote: >> On Thu, 2018-04-12 at 15:15 +0200, Dario Faggioli wrote: >>> On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote: >>>> >>>> dies after the first iteration. >>>> >>>> BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags)); >>>> >> >> Update. I replaced this: >> > Olaf, new patch! :-) > > FTR, a previous version of this (where I was not printing > smp_processor_id() and prev->is_running), produced the output that I am > attaching below. > > Looks to me like, while on the crashing CPU, we are here [*]: > > void context_saved(struct vcpu *prev) > { > ... > if ( unlikely(prev->pause_flags & VPF_migrating) ) > { > unsigned long flags; > spinlock_t *lock = vcpu_schedule_lock_irqsave(prev, &flags); > > if (vcpu_runnable(prev) || !test_bit(_VPF_migrating, > &prev->pause_flags)) > printk("CPU %u: d%uv%d isr=%u runnbl=%d proc=%d pf=%lu orq=%d > csf=%u\n", > smp_processor_id(), prev->domain->domain_id, prev->vcpu_id, > prev->is_running, vcpu_runnable(prev), > prev->processor, prev->pause_flags, > SCHED_OP(vcpu_scheduler(prev), onrunq, prev), > SCHED_OP(vcpu_scheduler(prev), csflags, prev)); > > [*] > > if ( prev->runstate.state == RUNSTATE_runnable ) > vcpu_runstate_change(prev, RUNSTATE_offline, NOW()); > BUG_ON(curr_on_cpu(prev->processor) == prev); > SCHED_OP(vcpu_scheduler(prev), sleep, prev); > > vcpu_schedule_unlock_irqrestore(lock, flags, prev); > > vcpu_migrate(prev); > } > } > > On the "other CPU", we might be around here [**]: > > static void vcpu_migrate(struct vcpu *v) > { > ... > if ( v->is_running || > !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )n I think the bottom line is, for this test to be valid, then at this point test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but at this point it doesn’t: If someone else has come by and cleared the bit, done migration, and woken it up, and then someone *else* set the bit again without taking it off the runqueue, it may still be on the runqueue. My series which calls vcpu_sleep_nosync_locked() after setting VPF_migrating should help with this. Or, alternately, instead of baking all this implicit knowledge about credit into the scheduler, we should just implement credit_vcpu_migrate(), and have it remove it from one runqueue and put it on another. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |