[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: BUG: credit=sched2 machine hang when using DRAKVUF
----- 23 paź, 2020 o 6:47, Jürgen Groß jgross@xxxxxxxx napisał(a): > On 23.10.20 00:59, Michał Leszczyński wrote: >> Hello, >> >> when using DRAKVUF against a Windows 7 x64 DomU, the whole machine hangs >> after a >> few minutes. >> >> The chance for a hang seems to be correlated with number of PCPUs, in this >> case >> we have 14 PCPUs and hang is very easily reproducible, while on other >> machines >> with 2-4 PCPUs it's very rare (but still occurring sometimes). The issue is >> observed with the default sched=credit2 and is no longer reproducible once >> sched=credit is set. > > Interesting. Can you please share some more information? > > Which Xen version are you using? > > Is there any additional information in the dom0 log which could be > related to the hang (earlier WARN() splats, Oopses, Xen related > messages, hardware failure messages, ...? > > Can you please try to get backtraces of all cpus at the time of the > hang? > > It would help to know which cpu was the target of the call of > smp_call_function_single(), so a disassembly of that function would > be needed to find that information from the dumped registers. > > I'm asking because I've seen a similar problem recently and I was > rather suspecting a fifo event channel issue than the Xen scheduler, > but your data suggests it could be the scheduler after all (if it is > the same issue, of course). > > > Juergen As I've said before, I'm using RELEASE-4.14.0, this is DELL PowerEdge R640 with 14 PCPUs. I have the following additional pieces of log (enclosed below). As you could see, the issue is about particular vCPUs of Dom0 not being scheduled for a long time, which really decreases stability of the host system. Hope this helps somehow. Best regards, Michał Leszczyński CERT Polska --- [ 313.730969] rcu: INFO: rcu_sched self-detected stall on CPU [ 313.731154] rcu: 5-....: (5249 ticks this GP) idle=c6e/1/0x4000000000000002 softirq=4625/4625 fqs=2624 [ 313.731474] rcu: (t=5250 jiffies g=10309 q=220) [ 338.968676] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 346.963959] watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [xenconsoled:2747] (XEN) *** Serial input to Xen (type 'CTRL-a' three times to switch input) (XEN) sched_smt_power_savings: disabled (XEN) NOW=384307105230 (XEN) Online Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26 (XEN) Cpupool 0: (XEN) Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26 (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2) (XEN) Active queues: 2 (XEN) default-weight = 256 (XEN) Runqueue 0: (XEN) ncpus = 7 (XEN) cpus = 0,2,4,6,8,10,12 (XEN) max_weight = 256 (XEN) pick_bias = 10 (XEN) instload = 3 (XEN) aveload = 805194 (~307%) (XEN) idlers: 00000000,00000000,00000000,00000000,00000000,00000000,00001145 (XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00000000,00000000,00000000,00000000,00000000,00000000,00001145 (XEN) Runqueue 1: (XEN) ncpus = 7 (XEN) cpus = 14,16,18,20,22,24,26 (XEN) max_weight = 256 (XEN) pick_bias = 22 (XEN) instload = 0 (XEN) aveload = 51211 (~19%) (XEN) idlers: 00000000,00000000,00000000,00000000,00000000,00000000,05454000 (XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00000000,00000000,00000000,00000000,00000000,00000000,05454000 (XEN) Domain info: (XEN) Domain: 0 w 256 c 0 v 14 (XEN) 1: [0.0] flags=20 cpu=0 credit=-10000000 [w=256] load=4594 (~1%) (XEN) 2: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144 (~100%) (XEN) 3: [0.2] flags=22 cpu=4 credit=-10000000 [w=256] load=262144 (~100%) (XEN) 4: [0.3] flags=20 cpu=6 credit=-10000000 [w=256] load=4299 (~1%) (XEN) 5: [0.4] flags=20 cpu=8 credit=-10000000 [w=256] load=4537 (~1%) (XEN) 6: [0.5] flags=22 cpu=10 credit=-10000000 [w=256] load=262144 (~100%) (XEN) 7: [0.6] flags=20 cpu=12 credit=-10000000 [w=256] load=5158 (~1%) (XEN) 8: [0.7] flags=20 cpu=14 credit=10053352 [w=256] load=5150 (~1%) (XEN) 9: [0.8] flags=20 cpu=16 credit=10200526 [w=256] load=5155 (~1%) (XEN) 10: [0.9] flags=20 cpu=18 credit=10207025 [w=256] load=4939 (~1%) (XEN) 11: [0.10] flags=20 cpu=20 credit=10131199 [w=256] load=5753 (~2%) (XEN) 12: [0.11] flags=20 cpu=22 credit=8103663 [w=256] load=22544 (~8%) (XEN) 13: [0.12] flags=20 cpu=24 credit=10213151 [w=256] load=4905 (~1%) (XEN) 14: [0.13] flags=20 cpu=26 credit=10235821 [w=256] load=4858 (~1%) (XEN) Domain: 29 w 256 c 0 v 4 (XEN) 15: [29.0] flags=0 cpu=16 credit=10500000 [w=256] load=0 (~0%) (XEN) 16: [29.1] flags=0 cpu=26 credit=10500000 [w=256] load=0 (~0%) (XEN) 17: [29.2] flags=0 cpu=10 credit=8294046 [w=256] load=0 (~0%) (XEN) 18: [29.3] flags=0 cpu=12 credit=9009727 [w=256] load=0 (~0%) (XEN) Domain: 30 w 256 c 0 v 4 (XEN) 19: [30.0] flags=0 cpu=26 credit=10500000 [w=256] load=0 (~0%) (XEN) 20: [30.1] flags=0 cpu=16 credit=10500000 [w=256] load=0 (~0%) (XEN) 21: [30.2] flags=0 cpu=18 credit=10500000 [w=256] load=0 (~0%) (XEN) 22: [30.3] flags=0 cpu=22 credit=10500000 [w=256] load=0 (~0%) (XEN) Runqueue 0: (XEN) CPU[00] runq=0, sibling={0}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[02] runq=0, sibling={2}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[04] runq=0, sibling={4}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) run: [0.2] flags=22 cpu=4 credit=-10000000 [w=256] load=262144 (~100%) (XEN) CPU[06] runq=0, sibling={6}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[08] runq=0, sibling={8}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[10] runq=0, sibling={10}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) run: [0.5] flags=22 cpu=10 credit=-10000000 [w=256] load=262144 (~100%) (XEN) CPU[12] runq=0, sibling={12}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) RUNQ: (XEN) 0: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144 (~100%) (XEN) Runqueue 1: (XEN) CPU[14] runq=1, sibling={14}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[16] runq=1, sibling={16}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[18] runq=1, sibling={18}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[20] runq=1, sibling={20}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[22] runq=1, sibling={22}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[24] runq=1, sibling={24}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) CPU[26] runq=1, sibling={26}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26} (XEN) RUNQ: (XEN) CPUs info: (XEN) CPU[00] current=d[IDLE]v0, curr=d[IDLE]v0, prev=NULL (XEN) CPU[02] current=d[IDLE]v2, curr=d[IDLE]v2, prev=NULL (XEN) CPU[04] current=d0v2, curr=d0v2, prev=NULL (XEN) CPU[06] current=d[IDLE]v6, curr=d[IDLE]v6, prev=NULL (XEN) CPU[08] current=d[IDLE]v8, curr=d[IDLE]v8, prev=NULL (XEN) CPU[10] current=d0v5, curr=d0v5, prev=NULL (XEN) CPU[12] current=d[IDLE]v12, curr=d[IDLE]v12, prev=NULL (XEN) CPU[14] current=d[IDLE]v14, curr=d[IDLE]v14, prev=NULL (XEN) CPU[16] current=d[IDLE]v16, curr=d[IDLE]v16, prev=NULL (XEN) CPU[18] current=d[IDLE]v18, curr=d[IDLE]v18, prev=NULL (XEN) CPU[20] current=d[IDLE]v20, curr=d[IDLE]v20, prev=NULL (XEN) CPU[22] current=d[IDLE]v22, curr=d[IDLE]v22, prev=NULL (XEN) CPU[24] current=d[IDLE]v24, curr=d[IDLE]v24, prev=NULL (XEN) CPU[26] current=d[IDLE]v26, curr=d[IDLE]v26, prev=NULL [ 366.966158] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 374.961437] watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [xenconsoled:2747] [ 376.737277] rcu: INFO: rcu_sched self-detected stall on CPU [ 376.737457] rcu: 5-....: (21002 ticks this GP) idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=10491 [ 376.737773] rcu: (t=21003 jiffies g=10309 q=4631) [ 402.958905] watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [xenconsoled:2747] [ 402.962904] watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sshd:5991] [ 413.434099] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 2-... 5-... } 5657 jiffies s: 57 root: 0x24/. [ 413.434470] rcu: blocking rcu_node structures: [ 430.956362] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 430.960361] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 439.743562] rcu: INFO: rcu_sched self-detected stall on CPU [ 439.743741] rcu: 5-....: (36755 ticks this GP) idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=18363 [ 439.744060] rcu: (t=36756 jiffies g=10309 q=8837) [ 458.953810] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 466.957079] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 476.916310] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 2-... 5-... } 21529 jiffies s: 57 root: 0x24/. [ 476.916712] rcu: blocking rcu_node structures: [ 486.951250] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 491.251030] INFO: task sshd:5993 blocked for more than 120 seconds. [ 491.251249] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 491.251535] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 491.251891] INFO: task sshd:5995 blocked for more than 120 seconds. [ 491.252078] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 491.252321] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 491.252657] INFO: task qemu-system-i38:6011 blocked for more than 120 seconds. [ 491.252924] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 491.253167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 491.253512] INFO: task sshd:6056 blocked for more than 120 seconds. [ 491.253703] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 491.253947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 494.954517] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 502.749802] rcu: INFO: rcu_sched self-detected stall on CPU [ 502.749981] rcu: 5-....: (52508 ticks this GP) idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=26225 [ 502.750307] rcu: (t=52509 jiffies g=10309 q=12321) [ 514.948683] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 526.951580] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 540.398469] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 2-... 5-... } 37401 jiffies s: 57 root: 0x24/. [ 540.398843] rcu: blocking rcu_node structures: [ 542.946109] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 554.949003] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 565.756021] rcu: INFO: rcu_sched self-detected stall on CPU [ 565.756203] rcu: 5-....: (68261 ticks this GP) idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=34096 [ 565.756521] rcu: (t=68262 jiffies g=10309 q=14838) [ 570.943529] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 590.945683] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 598.940945] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 603.880631] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 2-... 5-... } 53273 jiffies s: 57 root: 0x24/. [ 603.881006] rcu: blocking rcu_node structures: [ 612.071898] INFO: task ntpd:2726 blocked for more than 120 seconds. [ 612.072120] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 612.072409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 612.072806] INFO: task sshd:5993 blocked for more than 120 seconds. [ 612.073016] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 612.073258] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 612.073605] INFO: task sshd:5995 blocked for more than 120 seconds. [ 612.085380] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 612.097439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 612.109691] INFO: task qemu-system-i38:6011 blocked for more than 120 seconds. [ 612.121877] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 612.134123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 612.146501] INFO: task sshd:6056 blocked for more than 120 seconds. [ 612.158836] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 612.171134] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 612.183643] INFO: task qemu-system-i38:6105 blocked for more than 120 seconds. [ 612.196174] Tainted: P OEL 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 [ 612.208899] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 618.943102] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991] [ 626.938354] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 628.762185] rcu: INFO: rcu_sched self-detected stall on CPU [ 628.774574] rcu: 5-....: (84011 ticks this GP) idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=41964 [ 628.787208] rcu: (t=84015 jiffies g=10309 q=21278) [ 654.935761] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [xenconsoled:2747] [ 654.939869] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |