[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BUG: credit=sched2 machine hang when using DRAKVUF



[Cc-ing George as it's often useful having him in the loop when doing 
 all this math on credits]

On Wed, 2020-10-28 at 03:04 +0100, Michał Leszczyński wrote:
> ----- 23 paź, 2020 o 6:47, Jürgen Groß jgross@xxxxxxxx napisał(a):
> As I've said before, I'm using RELEASE-4.14.0, this is DELL PowerEdge
> R640 with 14 PCPUs.
> 
> I have the following additional pieces of log (enclosed below). As
> you could see, the issue is about particular vCPUs of Dom0 not being
> scheduled for a long time, which really decreases stability of the
> host system.
> 
> ---
> 
> [  313.730969] rcu: INFO: rcu_sched self-detected stall on CPU
> [  313.731154] rcu:     5-....: (5249 ticks this GP)
> idle=c6e/1/0x4000000000000002 softirq=4625/4625 fqs=2624
> [  313.731474] rcu:      (t=5250 jiffies g=10309 q=220)
> [  338.968676] watchdog: BUG: soft lockup - CPU#5 stuck for 22s!
> [sshd:5991]
> [  346.963959] watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
> [xenconsoled:2747]
> (XEN) *** Serial input to Xen (type 'CTRL-a' three times to switch
> input)
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=384307105230
> (XEN) Online Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26
> (XEN) Cpupool 0:
> (XEN) Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26
> (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
> (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2)
> (XEN) Active queues: 2
> (XEN)   default-weight     = 256
> (XEN) Runqueue 0:
> (XEN)   ncpus              = 7
> (XEN)   cpus               = 0,2,4,6,8,10,12
> (XEN)   max_weight         = 256
> (XEN)   pick_bias          = 10
> (XEN)   instload           = 3
> (XEN)   aveload            = 805194 (~307%)
> [...]
> (XEN) Runqueue 1:
> (XEN)   ncpus              = 7
> (XEN)   cpus               = 14,16,18,20,22,24,26
> (XEN)   max_weight         = 256
> (XEN)   pick_bias          = 22
> (XEN)   instload           = 0
> (XEN)   aveload            = 51211 (~19%)
> (XEN)   idlers:
>
That's quite imbalanced... Is there any pinning involved, by any
chance?

> [...]
> (XEN) Domain info:
> (XEN)   Domain: 0 w 256 c 0 v 14
> (XEN)     1: [0.0] flags=20 cpu=0 credit=-10000000 [w=256] load=4594
> (~1%)
> (XEN)     2: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144
> (~100%)
> (XEN)     3: [0.2] flags=22 cpu=4 credit=-10000000 [w=256]
> load=262144 (~100%)
> (XEN)     4: [0.3] flags=20 cpu=6 credit=-10000000 [w=256] load=4299
> (~1%)
> (XEN)     5: [0.4] flags=20 cpu=8 credit=-10000000 [w=256] load=4537
> (~1%)
> (XEN)     6: [0.5] flags=22 cpu=10 credit=-10000000 [w=256]
> load=262144 (~100%)
> (XEN)     7: [0.6] flags=20 cpu=12 credit=-10000000 [w=256] load=5158
>
All these credit = -10000000 are a bit weird. The scheduler only allow
a vCPU to run until it has no less than -500000. As soon as a vCPU
reaches that point (actually, as soon as it reaches credit < 0), it
should be preempted. And if there is no other one with positive amount
of credits, credits themselves are reset (by adding 10000000 to them).

So, in an ideal world, we'll see no vCPU with credit < 0  here. We know
tat we're not in a super-ideal world, and that's why we allow for
credits to go a bit lower, until -500000.

And we're not even in an ideal world, so values smaller than -500000
are actually possible, due to timer/IRQ latency, etc. But they
shouldn't be too smaller than that.

If we see -10000000, it seems that a vCPU managed to run form more than
20ms, i.e., the 10ms it's been given at the last reset event plus 10ms
more, which brought it to -10ms. We can't know exacly how much more
than 20ms, because even if it run for 40ms, -10000000 is the lower
limit for the credits, and we clip "more negative" values at that
level.

Point is though, the scheduler is indeed setting timers for making sure
that the vCPU is interrupted when it reaches -500000. It looks that
here, quite a few vCPU are able to ignore such timer interrupts, or the
timer interrupts are not being delivered, or they are, but with some
giant latency/delay.

> (XEN) Runqueue 0:
> (XEN) CPU[00] runq=0, sibling={0}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
> (XEN) CPU[02] runq=0, sibling={2}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
> (XEN) CPU[04] runq=0, sibling={4}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
> (XEN)   run: [0.2] flags=22 cpu=4 credit=-10000000 [w=256] load=262144 (~100%)
>
And here we see this again. So, they're reporting being stuck, but it's
not that they're stuck as "not being run by the hypervisor. On the
contrary, they're running all the time!

And... well, I'd like to think a bit more about this but I'd say that
the vCPU is running and the functions that do the credit accounting
(like burn_credits()-->t2c_update()) are being called, otherwise we
would not see -10000000.

Also...

> (XEN) CPU[06] runq=0, sibling={6}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
> (XEN) CPU[08] runq=0, sibling={8}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
> (XEN) CPU[10] runq=0, sibling={10}, 
> core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
> (XEN)   run: [0.5] flags=22 cpu=10 credit=-10000000 [w=256] load=262144 
> (~100%)
> (XEN) CPU[12] runq=0, sibling={12}, 
> core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
> (XEN) RUNQ:
> (XEN)     0: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144 (~100%)
>
... The actual scheduler (csched2_schedule()) is not being invoked. In
fact, there is vCPU d0v1, sitting here in the runqueue.

And if the scheduler would run on either CPU 4 or 12, it would notice
that vCPUs 2 and 5 have -10000000 credits, while there is vCPU 1
waiting with 9134904 credits. And they'd pick it up and preempt.

Or it is running, but is not doing what I just described for whatever
bug we have there, but that I am not currently seeing.

Also, all the other idle vCPUs are not picking d0v1 up either.

Another thing that should not happen.

So, can I see:

# xl info -n
# xl vcpu-list

 ?

In the meantime, I'll continue looking at the code, and maybe come up
with a debug patch, e.g., for figuring out whether csched2_schedule()
is either running but not doing its job, or not running at all.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.