[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] crash in csched_load_balance after xl vcpu-pin

[Adding Andrew, not because I expect anything, but just because we've 
 chatted about this issue on IRC :-) ]

On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
> On Tue, Apr 10, Dario Faggioli wrote:
>     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
> (XEN) Xen BUG at sched_credit.c:876
> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> 3.bug1087289_411  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    118
> (XEN) RIP:    e008:[<ffff82d080229ab4>]
> sched_credit.c#csched_vcpu_migrate+0x27/0x51
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080229ab4>]
> sched_credit.c#csched_vcpu_migrate+0x27/0x51
> (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
> (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
> (XEN)    [<ffff82d08023935f>] context_saved+0x8d/0x94
> (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
> (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
> (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
> (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
Hey... unless I've really put there a totally bogous BUG_ON(), this
looks interesting and potentially useful.

It says that the vcpu which is being context switched out, and on which
we are calling vcpu_migrate() on, because we found it to be
VPF_migrating, is actually in the runqueue already when we actually get
to execute vcpu_migrate()->vcpu_move_locked().

Mmm... let's see.

 CPU A                                  CPU B
 .                                      .
 schedule(current == v)                 vcpu_set_affinity(v)
  prev = current     // == v             .
  schedule_lock(CPU A)                   .
   csched_schedule()                     schedule_lock(CPU A)
   if (runnable(v))  //YES               x
    runq_insert(v)                       x
   return next != v                      x
  schedule_unlock(CPU A)                 x // takes the lock
  context_switch(prev,next)              set_bit(v, VPF_migrating)  [*]
   context_saved(prev) // still == v     .
    v->is_running = 0                    schedule_unlock(CPU A)
    SMP_MB                               .
    if (test_bit(v, VPF_migrating)) // YES!!
     vcpu_migrate(v)                     .
      for {                              .
       schedule_lock(CPU A)              .
       SCHED_OP(v, pick_cpu)             .
        set_bit(v, CSCHED_MIGRATING)     .
        return CPU C                     .
       pick_called = 1                   .
       schedule_unlock(CPU A)            .
       schedule_lock(CPU A + CPU C)      .
       if (pick_called && ...) // YES    .
        break                            .
      }                                  .
      // v->is_running is 0              .
      //!test_and_clear(v, VPF_migrating)) is false!!
      clear_bit(v, VPF_migrating)        .
      vcpu_move_locked(v, CPU C)         .
      BUG_ON(__vcpu_on_runq(v))          .

[*] after this point, and until someone manages to call vcpu_sleep(),  
      v sits in CPU A's runqueue with the VPF_migrating pause flag set

So, basically, the race is between context_saved() and
vcpu_set_affinity(). Basically, vcpu_set_affinity() sets the
VPF_migrating pause flags on a vcpu in a runqueue, with the intent of
letting either a vcpu_sleep_nosync() or a reschedule remove it from
there, but context_saved() manage to see the flag, before the removal
can happen.

And I think this explains also the original BUG at sched_credit.c:1694
(it's just a bit more involved).

As it can be seen above (and also in the code comment in ) there is a
barrier (which further testify that this is indeed a tricky passage),
but I guess it is not that effective! :-/

TBH, I have actually never fully understood what that comment really
meant, what the barrier was protecting, and how... e.g., isn't it
missing its paired one? In fact, there's another comment, clearly
related, right in vcpu_set_affinity(). But again I'm a bit at loss at
properly figuring out what the big idea is.

George, what do you think? Does this make sense?

Well, I'll think more about this, and to a possible fix, tomorrow

<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.