[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] crash in csched_load_balance after xl vcpu-pin

On Wed, 2018-04-11 at 17:27 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Olaf Hering wrote:
> > That was with sched=credit2, sorry for that.
> > Now with just that second patch ...
> Still BUG in csched_load_balance.
> (XEN) Xen BUG at sched_credit.c:1694
> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> 6.bug1087289_411  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    135
> (XEN) RIP:    e008:[<ffff82d08022ae34>]
> sched_credit.c#csched_schedule+0x44a/0xd42
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022ae34>]
> sched_credit.c#csched_schedule+0x44a/0xd42
> (XEN)    [<ffff82d080236406>] schedule.c#schedule+0x107/0x627
> (XEN)    [<ffff82d080239ec5>] softirq.c#__do_softirq+0x85/0x90
> (XEN)    [<ffff82d080239f1a>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d0802738f0>] domain.c#idle_loop+0xac/0xbe
Ok, back to square 1. :-/

A data point is that Credit2 works. In Credit2, vcpu_move_locked()
(called by vcpu_migrate()) calls a function called migrate() which
--because of Credit2 specific reasons-- consider legit the fact that it
finds the vcpu in a runqueue... So that's what I think "save" us, and
that is why this data point does not help much (sorry Olaf for not
realizing this earlier, and asking you to try Credit2). :-(

On the other hand, in Credit1, there should be no good reason why
vcpu_migrate() would be called on a vcpu which is on a runqueue, and
the fact that we're still crashing proves that there is at least
another race, causing that to happen.

So, the debug patch I posted previously in this thread, was wrong. I'm
attaching a new one to this email. Olaf, if you're trying again, please
do it with both, the "fix" (xen-sched-debug-vcpumigrate-race.patch),
and this one.

Debug hypervisor, as usual, if possible. :-)

It will crash, again, possibly with the same stack trace, but I think
it's worth a try.

<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

Attachment: csched_migrate_debug.patch
Description: Text Data

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.