Xen project Mailing List

Re: BUG: credit=sched2 machine hang when using DRAKVUF

To: Michał Leszczyński <michal.leszczynski@xxxxxxx>, Jürgen Groß <jgross@xxxxxxxx>

From: Dario Faggioli <dfaggioli@xxxxxxxx>

Date: Fri, 30 Oct 2020 19:50:57 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, George Dunlap <George.Dunlap@xxxxxxxxxx>

Delivery-date: Fri, 30 Oct 2020 18:51:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

[Cc-ing George as it's often useful having him in the loop when doing all this math on credits] On Wed, 2020-10-28 at 03:04 +0100, Michał Leszczyński wrote: > ----- 23 paź, 2020 o 6:47, Jürgen Groß jgross@xxxxxxxx napisał(a): > As I've said before, I'm using RELEASE-4.14.0, this is DELL PowerEdge > R640 with 14 PCPUs. > > I have the following additional pieces of log (enclosed below). As > you could see, the issue is about particular vCPUs of Dom0 not being > scheduled for a long time, which really decreases stability of the > host system. > > --- > > [ 313.730969] rcu: INFO: rcu_sched self-detected stall on CPU > [ 313.731154] rcu: 5-....: (5249 ticks this GP) > idle=c6e/1/0x4000000000000002 softirq=4625/4625 fqs=2624 > [ 313.731474] rcu: (t=5250 jiffies g=10309 q=220) > [ 338.968676] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! > [sshd:5991] > [ 346.963959] watchdog: BUG: soft lockup - CPU#2 stuck for 23s! > [xenconsoled:2747] > (XEN) *** Serial input to Xen (type 'CTRL-a' three times to switch > input) > (XEN) sched_smt_power_savings: disabled > (XEN) NOW=384307105230 > (XEN) Online Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26 > (XEN) Cpupool 0: > (XEN) Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26 > (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource > (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2) > (XEN) Active queues: 2 > (XEN) default-weight = 256 > (XEN) Runqueue 0: > (XEN) ncpus = 7 > (XEN) cpus = 0,2,4,6,8,10,12 > (XEN) max_weight = 256 > (XEN) pick_bias = 10 > (XEN) instload = 3 > (XEN) aveload = 805194 (~307%) > [...] > (XEN) Runqueue 1: > (XEN) ncpus = 7 > (XEN) cpus = 14,16,18,20,22,24,26 > (XEN) max_weight = 256 > (XEN) pick_bias = 22 > (XEN) instload = 0 > (XEN) aveload = 51211 (~19%) > (XEN) idlers: > That's quite imbalanced... Is there any pinning involved, by any chance? > [...] > (XEN) Domain info: > (XEN) Domain: 0 w 256 c 0 v 14 > (XEN) 1: [0.0] flags=20 cpu=0 credit=-10000000 [w=256] load=4594 > (~1%) > (XEN) 2: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144 > (~100%) > (XEN) 3: [0.2] flags=22 cpu=4 credit=-10000000 [w=256] > load=262144 (~100%) > (XEN) 4: [0.3] flags=20 cpu=6 credit=-10000000 [w=256] load=4299 > (~1%) > (XEN) 5: [0.4] flags=20 cpu=8 credit=-10000000 [w=256] load=4537 > (~1%) > (XEN) 6: [0.5] flags=22 cpu=10 credit=-10000000 [w=256] > load=262144 (~100%) > (XEN) 7: [0.6] flags=20 cpu=12 credit=-10000000 [w=256] load=5158 > All these credit = -10000000 are a bit weird. The scheduler only allow a vCPU to run until it has no less than -500000. As soon as a vCPU reaches that point (actually, as soon as it reaches credit < 0), it should be preempted. And if there is no other one with positive amount of credits, credits themselves are reset (by adding 10000000 to them). So, in an ideal world, we'll see no vCPU with credit < 0 here. We know tat we're not in a super-ideal world, and that's why we allow for credits to go a bit lower, until -500000. And we're not even in an ideal world, so values smaller than -500000 are actually possible, due to timer/IRQ latency, etc. But they shouldn't be too smaller than that. If we see -10000000, it seems that a vCPU managed to run form more than 20ms, i.e., the 10ms it's been given at the last reset event plus 10ms more, which brought it to -10ms. We can't know exacly how much more than 20ms, because even if it run for 40ms, -10000000 is the lower limit for the credits, and we clip "more negative" values at that level. Point is though, the scheduler is indeed setting timers for making sure that the vCPU is interrupted when it reaches -500000. It looks that here, quite a few vCPU are able to ignore such timer interrupts, or the timer interrupts are not being delivered, or they are, but with some giant latency/delay. > (XEN) Runqueue 0: > (XEN) CPU[00] runq=0, sibling={0}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} > (XEN) CPU[02] runq=0, sibling={2}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} > (XEN) CPU[04] runq=0, sibling={4}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} > (XEN) run: [0.2] flags=22 cpu=4 credit=-10000000 [w=256] load=262144 (~100%) > And here we see this again. So, they're reporting being stuck, but it's not that they're stuck as "not being run by the hypervisor. On the contrary, they're running all the time! And... well, I'd like to think a bit more about this but I'd say that the vCPU is running and the functions that do the credit accounting (like burn_credits()-->t2c_update()) are being called, otherwise we would not see -10000000. Also... > (XEN) CPU[06] runq=0, sibling={6}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} > (XEN) CPU[08] runq=0, sibling={8}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} > (XEN) CPU[10] runq=0, sibling={10}, > core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} > (XEN) run: [0.5] flags=22 cpu=10 credit=-10000000 [w=256] load=262144 > (~100%) > (XEN) CPU[12] runq=0, sibling={12}, > core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} > (XEN) RUNQ: > (XEN) 0: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144 (~100%) > ... The actual scheduler (csched2_schedule()) is not being invoked. In fact, there is vCPU d0v1, sitting here in the runqueue. And if the scheduler would run on either CPU 4 or 12, it would notice that vCPUs 2 and 5 have -10000000 credits, while there is vCPU 1 waiting with 9134904 credits. And they'd pick it up and preempt. Or it is running, but is not doing what I just described for whatever bug we have there, but that I am not currently seeing. Also, all the other idle vCPUs are not picking d0v1 up either. Another thing that should not happen. So, can I see: # xl info -n # xl vcpu-list ? In the meantime, I'll continue looking at the code, and maybe come up with a debug patch, e.g., for figuring out whether csched2_schedule() is either running but not doing its job, or not running at all. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.