[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] crash in csched_load_balance after xl vcpu-pin



On Thu, 2018-04-12 at 09:38 +0000, George Dunlap wrote:
> > On Apr 11, 2018, at 10:31 PM, Dario Faggioli <raistlin@xxxxxxxx>
> > wrote:
> > (XEN) Xen BUG at sched_credit.c:876
> > (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> > 7.bug1087289_411  x86_64  debug=y   Not tainted ]----
> > (XEN) CPU:    108
> > (XEN) RIP:    e008:[<ffff82d080229ab4>]
> > sched_credit.c#csched_vcpu_migrate+0x27/0x54
> > (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080229ab4>]
> > sched_credit.c#csched_vcpu_migrate+0x27/0x54
> > (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
> > (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
> > (XEN)    [<ffff82d080239367>] context_saved+0x95/0x9c
> > (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
> > (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
> > (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
> > (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
> > 
> > So, really *exactly* the same. Ok, thanks.
> 
> But this doesn’t make any sense.  If you applied Dario’s ‘fix’ patch,
> then context_saved() should have *just* called vcpu_sleep_nosync()
> before calling vcpu_migrate().  The VPF_migrating flag should still
> be set, so it should have called csched_vcpu_sleep(); and sd->curr
> should have been changed to be != prev way back in schedule(), so
> csched_vcpu_sleep() should have called runq_remove().
> 
Well, you've just described me, banging my head on my desk, since
yesterday afternoon. :-P

> It’s probably worth asking the obvious question: Are you sure the
> “fix” patch is actually applied (in addition to the new “debug”
> patch)? :-)
> 
> If so, then maybe it’s time to open-code vcpu_sleep_nosync() there in
> context_saved(), to try to figure out where our understanding of what
> *should* happen is incorrect.
> 
Ehm... Can you please stop reading my mind? It's annoying. :-D
Well, I guess we can say: "great minds think alike". :-P

Olaf, new patch. Please, remove _everything_ and apply _only_ this one.

As George is saying, the vcpu just can't be in the runqueue, unless:
 1) vcpu_sleep_nosync() did not remove it
 2) someone is putting it back there

Let's check 1 first.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

Attachment: context-save-race-debug.patch
Description: Text Data

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.