[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] crash in csched_load_balance after xl vcpu-pin



On Fri, 2018-04-13 at 08:23 +0200, Olaf Hering wrote:
> Am Thu, 12 Apr 2018 19:25:43 +0200
> schrieb Dario Faggioli <dfaggioli@xxxxxxxx>:
> 
> > Olaf, new patch! :-)
> 
>     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
> 
Thanks!

> (XEN) CPU 36: d10v1 isr=0 runnbl=1 proc=36 pf=0 orq=0 csf=4
>
So, FTR:
- CPU is smp_processor_id()
- dXvY is prev, in context_saved()
- isr is prev->is_running
- runnbl is vcpu_runnable(prev)
- proc is prev->processor
- pf is pref->pause_flags
- orq is __vcpu_on_runq(CSCHED_VCPU(prev)) (coming from sched_credit.c)
- csf is CSCHED_VCPU(prev)->flags

csf = 4 is CSCHED_FLAG_VCPU_MIGRATING, which means someone is calling
vcpu_migrate() on prev, on some other processor (presumably via
vcpu_set_affinity()) and is around here:

static void vcpu_migrate(struct vcpu *v)
{
    ...
    if ( v->is_running ||
         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
    {
        sched_spin_unlock_double(old_lock, new_lock, flags); 
        return; 
    } 
 
    vcpu_move_locked(v, new_cpu); 
 
    sched_spin_unlock_double(old_lock, new_lock, flags); 

    [**] 

    if ( old_cpu != new_cpu ) 
        sched_move_irqs(v); 
 
    /* Wake on new CPU. */ 
    vcpu_wake(v); 
}

I.e., SCHED_OP(pick_cpu) has been called already, but not vcpu_wake().

We must be past the sched_spin_unlock_double() because, on this
processor (i.e., CPU 31 in this crash), we are, while printing,
_inside_ a critical section on prev's scheduler lock.

> (XEN) CPU 33: d10v2 isr=0 runnbl=0 proc=33 pf=1 orq=0 csf=4
> (XEN) CPU 20: d10v2 isr=0 runnbl=1 proc=20 pf=0 orq=0 csf=4
> (XEN) CPU 32: d10v0 isr=0 runnbl=1 proc=32 pf=0 orq=0 csf=4
> (XEN) CPU 33: d10v0 isr=0 runnbl=1 proc=12 pf=0 orq=0 csf=4
> (XEN) CPU 36: d10v0 isr=0 runnbl=1 proc=36 pf=0 orq=0 csf=4
> (XEN) CPU 31: d10v0 isr=0 runnbl=1 proc=31 pf=0 orq=0 csf=4
> (XEN) Xen BUG at sched_credit.c:877
> (XEN) ----[ Xen-4.11.20180411T100655.82540b66ce-
> 180413055758  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    31
>
Right, so, in this case, the vcpu_migrate()->SCHED_OP(pick_cpu) did not
change prev->processor. That could very well have happened. This just
means that, if it weren't for the BUG_ON added in csched_vcpu_migrate()
by this patch, this iteration would not have crashed in
csched_load_balance().

However, in previous report, we have seen a situation where
prev->processor was 31 on CPU 16.

Fact is, VPF_migrating is 0 right now, for prev, which corroborates the
theory that we are at [*] point, in vcpu_migrate() on the other CPU. In
fact, it was 1, but !test_and_clear_bit() has been called to reset it.

However, in order for us, on this CPU, to actually execute
sched_move_locked(), like we do:

> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022c84d>]
> sched_credit.c#csched_vcpu_migrate+0x52/0x54
> (XEN)    [<ffff82d080239419>] schedule.c#vcpu_move_locked+0x42/0xcc
>
It means that someone raise VPF_migrating again!

> (XEN)    [<ffff82d08023a8d8>] schedule.c#vcpu_migrate+0x210/0x23b
> (XEN)    [<ffff82d08023c7ad>] context_saved+0x236/0x479
> (XEN)    [<ffff82d08027a558>] context_switch+0xe9/0xf67
> (XEN)    [<ffff82d0802397a9>] schedule.c#schedule+0x306/0x6ab
> (XEN)    [<ffff82d08023d56a>] softirq.c#__do_softirq+0x71/0x9a
> (XEN)    [<ffff82d08023d5dd>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080328d8b>] vmx_asm_do_vmentry+0x2b/0x30
> (XEN) ****************************************
> (XEN) Panic on CPU 31:
> (XEN) Xen BUG at sched_credit.c:877
> (XEN) ****************************************
>
Now, VPF_migrating is raise in the following circumstances:

* in __runq_tickle(): I actually was about to pinpoint this as the 
  problem, but then I realized that, when calling __runq_tickle(prev),
  in vcpu_wake() (called by vcpu_migrate()), we do not set the bit on
  prev itself, but on the currently running vcpu of prev->processor.
  And a vcpu that is in  per_cpu(schedule_data, <CPU>).curr, can't 
  also be prev in (any) context_saved(), I think.

* in csched_vcpu_acct(): we set the flag on CSCHED_VCPU(current). I 
   may be wrong, but I don't immediatly see why we use current here, 
   instead than curr_on_cpu(cpu). Yet, I think that, similarly to 
   above, current can't be prev. Still, I may send a "Just in case"^TM 
   patch... :-P

* in vcpu_force_reschedule(): it's used in shim code (well... :-) and 
  in VCPUOP_set_periodic_timer(). But it only sets the flag if
  prev->is_running is 1, which is not. Besides, don't most guests use 
  singleshot timer only these days?

* in cpu_disable_scheduler(): no. Just no.

* in vcpu_set_affinity(): well, it looks to me that, either, a) we use
  the set of the bit in here to actually enter the if() in
  context_saved(), which is a precondition for the race, and then we
  are already past that or, b) things just work. Will think more...

* in vcpu_pin_override(): again, no.... I think?

So, thoughts? :-)

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.