|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: BUG: credit=sched2 machine hang when using DRAKVUF
----- 23 paź, 2020 o 6:47, Jürgen Groß jgross@xxxxxxxx napisał(a):
> On 23.10.20 00:59, Michał Leszczyński wrote:
>> Hello,
>>
>> when using DRAKVUF against a Windows 7 x64 DomU, the whole machine hangs
>> after a
>> few minutes.
>>
>> The chance for a hang seems to be correlated with number of PCPUs, in this
>> case
>> we have 14 PCPUs and hang is very easily reproducible, while on other
>> machines
>> with 2-4 PCPUs it's very rare (but still occurring sometimes). The issue is
>> observed with the default sched=credit2 and is no longer reproducible once
>> sched=credit is set.
>
> Interesting. Can you please share some more information?
>
> Which Xen version are you using?
>
> Is there any additional information in the dom0 log which could be
> related to the hang (earlier WARN() splats, Oopses, Xen related
> messages, hardware failure messages, ...?
>
> Can you please try to get backtraces of all cpus at the time of the
> hang?
>
> It would help to know which cpu was the target of the call of
> smp_call_function_single(), so a disassembly of that function would
> be needed to find that information from the dumped registers.
>
> I'm asking because I've seen a similar problem recently and I was
> rather suspecting a fifo event channel issue than the Xen scheduler,
> but your data suggests it could be the scheduler after all (if it is
> the same issue, of course).
>
>
> Juergen
As I've said before, I'm using RELEASE-4.14.0, this is DELL PowerEdge R640 with
14 PCPUs.
I have the following additional pieces of log (enclosed below). As you could
see, the issue is about particular vCPUs of Dom0 not being scheduled for a long
time, which really decreases stability of the host system.
Hope this helps somehow.
Best regards,
Michał Leszczyński
CERT Polska
---
[ 313.730969] rcu: INFO: rcu_sched self-detected stall on CPU
[ 313.731154] rcu: 5-....: (5249 ticks this GP)
idle=c6e/1/0x4000000000000002 softirq=4625/4625 fqs=2624
[ 313.731474] rcu: (t=5250 jiffies g=10309 q=220)
[ 338.968676] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 346.963959] watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
[xenconsoled:2747]
(XEN) *** Serial input to Xen (type 'CTRL-a' three times to switch input)
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=384307105230
(XEN) Online Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26
(XEN) Cpupool 0:
(XEN) Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26
(XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
(XEN) Scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Active queues: 2
(XEN) default-weight = 256
(XEN) Runqueue 0:
(XEN) ncpus = 7
(XEN) cpus = 0,2,4,6,8,10,12
(XEN) max_weight = 256
(XEN) pick_bias = 10
(XEN) instload = 3
(XEN) aveload = 805194 (~307%)
(XEN) idlers: 00000000,00000000,00000000,00000000,00000000,00000000,00001145
(XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000
(XEN) fully idle cores:
00000000,00000000,00000000,00000000,00000000,00000000,00001145
(XEN) Runqueue 1:
(XEN) ncpus = 7
(XEN) cpus = 14,16,18,20,22,24,26
(XEN) max_weight = 256
(XEN) pick_bias = 22
(XEN) instload = 0
(XEN) aveload = 51211 (~19%)
(XEN) idlers: 00000000,00000000,00000000,00000000,00000000,00000000,05454000
(XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000
(XEN) fully idle cores:
00000000,00000000,00000000,00000000,00000000,00000000,05454000
(XEN) Domain info:
(XEN) Domain: 0 w 256 c 0 v 14
(XEN) 1: [0.0] flags=20 cpu=0 credit=-10000000 [w=256] load=4594 (~1%)
(XEN) 2: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144 (~100%)
(XEN) 3: [0.2] flags=22 cpu=4 credit=-10000000 [w=256] load=262144 (~100%)
(XEN) 4: [0.3] flags=20 cpu=6 credit=-10000000 [w=256] load=4299 (~1%)
(XEN) 5: [0.4] flags=20 cpu=8 credit=-10000000 [w=256] load=4537 (~1%)
(XEN) 6: [0.5] flags=22 cpu=10 credit=-10000000 [w=256] load=262144 (~100%)
(XEN) 7: [0.6] flags=20 cpu=12 credit=-10000000 [w=256] load=5158 (~1%)
(XEN) 8: [0.7] flags=20 cpu=14 credit=10053352 [w=256] load=5150 (~1%)
(XEN) 9: [0.8] flags=20 cpu=16 credit=10200526 [w=256] load=5155 (~1%)
(XEN) 10: [0.9] flags=20 cpu=18 credit=10207025 [w=256] load=4939 (~1%)
(XEN) 11: [0.10] flags=20 cpu=20 credit=10131199 [w=256] load=5753 (~2%)
(XEN) 12: [0.11] flags=20 cpu=22 credit=8103663 [w=256] load=22544 (~8%)
(XEN) 13: [0.12] flags=20 cpu=24 credit=10213151 [w=256] load=4905 (~1%)
(XEN) 14: [0.13] flags=20 cpu=26 credit=10235821 [w=256] load=4858 (~1%)
(XEN) Domain: 29 w 256 c 0 v 4
(XEN) 15: [29.0] flags=0 cpu=16 credit=10500000 [w=256] load=0 (~0%)
(XEN) 16: [29.1] flags=0 cpu=26 credit=10500000 [w=256] load=0 (~0%)
(XEN) 17: [29.2] flags=0 cpu=10 credit=8294046 [w=256] load=0 (~0%)
(XEN) 18: [29.3] flags=0 cpu=12 credit=9009727 [w=256] load=0 (~0%)
(XEN) Domain: 30 w 256 c 0 v 4
(XEN) 19: [30.0] flags=0 cpu=26 credit=10500000 [w=256] load=0 (~0%)
(XEN) 20: [30.1] flags=0 cpu=16 credit=10500000 [w=256] load=0 (~0%)
(XEN) 21: [30.2] flags=0 cpu=18 credit=10500000 [w=256] load=0 (~0%)
(XEN) 22: [30.3] flags=0 cpu=22 credit=10500000 [w=256] load=0 (~0%)
(XEN) Runqueue 0:
(XEN) CPU[00] runq=0, sibling={0}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[02] runq=0, sibling={2}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[04] runq=0, sibling={4}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) run: [0.2] flags=22 cpu=4 credit=-10000000 [w=256] load=262144 (~100%)
(XEN) CPU[06] runq=0, sibling={6}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[08] runq=0, sibling={8}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[10] runq=0, sibling={10}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) run: [0.5] flags=22 cpu=10 credit=-10000000 [w=256] load=262144 (~100%)
(XEN) CPU[12] runq=0, sibling={12}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) RUNQ:
(XEN) 0: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144 (~100%)
(XEN) Runqueue 1:
(XEN) CPU[14] runq=1, sibling={14}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[16] runq=1, sibling={16}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[18] runq=1, sibling={18}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[20] runq=1, sibling={20}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[22] runq=1, sibling={22}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[24] runq=1, sibling={24}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) CPU[26] runq=1, sibling={26}, core={0,2,4,6,8,10,12,14,16,18,20,22,24,26}
(XEN) RUNQ:
(XEN) CPUs info:
(XEN) CPU[00] current=d[IDLE]v0, curr=d[IDLE]v0, prev=NULL
(XEN) CPU[02] current=d[IDLE]v2, curr=d[IDLE]v2, prev=NULL
(XEN) CPU[04] current=d0v2, curr=d0v2, prev=NULL
(XEN) CPU[06] current=d[IDLE]v6, curr=d[IDLE]v6, prev=NULL
(XEN) CPU[08] current=d[IDLE]v8, curr=d[IDLE]v8, prev=NULL
(XEN) CPU[10] current=d0v5, curr=d0v5, prev=NULL
(XEN) CPU[12] current=d[IDLE]v12, curr=d[IDLE]v12, prev=NULL
(XEN) CPU[14] current=d[IDLE]v14, curr=d[IDLE]v14, prev=NULL
(XEN) CPU[16] current=d[IDLE]v16, curr=d[IDLE]v16, prev=NULL
(XEN) CPU[18] current=d[IDLE]v18, curr=d[IDLE]v18, prev=NULL
(XEN) CPU[20] current=d[IDLE]v20, curr=d[IDLE]v20, prev=NULL
(XEN) CPU[22] current=d[IDLE]v22, curr=d[IDLE]v22, prev=NULL
(XEN) CPU[24] current=d[IDLE]v24, curr=d[IDLE]v24, prev=NULL
(XEN) CPU[26] current=d[IDLE]v26, curr=d[IDLE]v26, prev=NULL
[ 366.966158] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 374.961437] watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
[xenconsoled:2747]
[ 376.737277] rcu: INFO: rcu_sched self-detected stall on CPU
[ 376.737457] rcu: 5-....: (21002 ticks this GP)
idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=10491
[ 376.737773] rcu: (t=21003 jiffies g=10309 q=4631)
[ 402.958905] watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
[xenconsoled:2747]
[ 402.962904] watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sshd:5991]
[ 413.434099] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
2-... 5-... } 5657 jiffies s: 57 root: 0x24/.
[ 413.434470] rcu: blocking rcu_node structures:
[ 430.956362] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 430.960361] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 439.743562] rcu: INFO: rcu_sched self-detected stall on CPU
[ 439.743741] rcu: 5-....: (36755 ticks this GP)
idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=18363
[ 439.744060] rcu: (t=36756 jiffies g=10309 q=8837)
[ 458.953810] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 466.957079] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 476.916310] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
2-... 5-... } 21529 jiffies s: 57 root: 0x24/.
[ 476.916712] rcu: blocking rcu_node structures:
[ 486.951250] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 491.251030] INFO: task sshd:5993 blocked for more than 120 seconds.
[ 491.251249] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 491.251535] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 491.251891] INFO: task sshd:5995 blocked for more than 120 seconds.
[ 491.252078] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 491.252321] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 491.252657] INFO: task qemu-system-i38:6011 blocked for more than 120
seconds.
[ 491.252924] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 491.253167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 491.253512] INFO: task sshd:6056 blocked for more than 120 seconds.
[ 491.253703] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 491.253947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 494.954517] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 502.749802] rcu: INFO: rcu_sched self-detected stall on CPU
[ 502.749981] rcu: 5-....: (52508 ticks this GP)
idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=26225
[ 502.750307] rcu: (t=52509 jiffies g=10309 q=12321)
[ 514.948683] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 526.951580] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 540.398469] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
2-... 5-... } 37401 jiffies s: 57 root: 0x24/.
[ 540.398843] rcu: blocking rcu_node structures:
[ 542.946109] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 554.949003] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 565.756021] rcu: INFO: rcu_sched self-detected stall on CPU
[ 565.756203] rcu: 5-....: (68261 ticks this GP)
idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=34096
[ 565.756521] rcu: (t=68262 jiffies g=10309 q=14838)
[ 570.943529] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 590.945683] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 598.940945] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 603.880631] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
2-... 5-... } 53273 jiffies s: 57 root: 0x24/.
[ 603.881006] rcu: blocking rcu_node structures:
[ 612.071898] INFO: task ntpd:2726 blocked for more than 120 seconds.
[ 612.072120] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 612.072409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 612.072806] INFO: task sshd:5993 blocked for more than 120 seconds.
[ 612.073016] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 612.073258] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 612.073605] INFO: task sshd:5995 blocked for more than 120 seconds.
[ 612.085380] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 612.097439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 612.109691] INFO: task qemu-system-i38:6011 blocked for more than 120
seconds.
[ 612.121877] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 612.134123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 612.146501] INFO: task sshd:6056 blocked for more than 120 seconds.
[ 612.158836] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 612.171134] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 612.183643] INFO: task qemu-system-i38:6105 blocked for more than 120
seconds.
[ 612.196174] Tainted: P OEL 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[ 612.208899] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 618.943102] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[ 626.938354] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 628.762185] rcu: INFO: rcu_sched self-detected stall on CPU
[ 628.774574] rcu: 5-....: (84011 ticks this GP)
idle=c6e/1/0x4000000000000002 softirq=4625/4630 fqs=41964
[ 628.787208] rcu: (t=84015 jiffies g=10309 q=21278)
[ 654.935761] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[xenconsoled:2747]
[ 654.939869] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |