[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Recent upgrade of 4.13 -> 4.14 issue
On Sat, Oct 31, 2020 at 02:34:32AM +0000, Dario Faggioli wrote: > On Tue, 2020-10-27 at 17:06 +0100, Frédéric Pierret wrote: > > > > Ok the server got frozen just few minutes after my mail and I got > > now: > > 'r': https://gist.github.com/fepitre/78541f555902275d906d627de2420571 > > > From the scheduler point of view, things seems fine: > > (XEN) sched_smt_power_savings: disabled > (XEN) NOW=770188952085 > (XEN) Online Cpus: 0-15 > (XEN) Cpupool 0: > (XEN) Cpus: 0-15 > (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource > (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2) > (XEN) Active queues: 2 > (XEN) default-weight = 256 > (XEN) Runqueue 0: > (XEN) ncpus = 8 > (XEN) cpus = 0-7 > (XEN) max_weight = 256 > (XEN) pick_bias = 1 > (XEN) instload = 7 > (XEN) aveload = 2021119 (~770%) > (XEN) idlers: 00000000,00000000 > (XEN) tickled: 00000000,00000000 > (XEN) fully idle cores: 00000000,00000000 > (XEN) Runqueue 1: > (XEN) ncpus = 8 > (XEN) cpus = 8-15 > (XEN) max_weight = 256 > (XEN) pick_bias = 9 > (XEN) instload = 8 > (XEN) aveload = 2097259 (~800%) > (XEN) idlers: 00000000,00000000 > (XEN) tickled: 00000000,00000200 > (XEN) fully idle cores: 00000000,00000000 > > The system is pretty busy, but not in overload. > > Below we see that CPU 3 is running the idle vCPU, but it's not marked > as neither idle nor tickled. > > It may be running a tasklet (the one that dumps the debug key output, I > guess). > > Credits are fine, I don't see any strange values that may indicate > anomalies or something. > > All the CPUs are executing a vCPU, and there should be nothing that > prevent them making progresses. > > There is one vCPU which apparetnly want to run at 100% in pretty much > all guests, and more than one in dom0. > > And I think I saw some spin_lock() in the call stacks, in the partial > report of '*' debug-key? Yes, I see: (XEN) *** Dumping CPU1 host state: *** (...) (XEN) Xen call trace: (XEN) [<ffff82d040223625>] R _spin_lock+0x35/0x40 (XEN) [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0 (XEN) [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0 (XEN) [<ffff82d0402df160>] S context_switch+0x110/0xa60 (XEN) [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250 (XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0 (XEN) [<ffff82d0402d64e6>] S x86_64/entry.S#process_softirqs+0x6/0x20 (...) (XEN) *** Dumping CPU2 host state: *** (XEN) Xen call trace: (XEN) [<ffff82d040223622>] R _spin_lock+0x32/0x40 (XEN) [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0 (XEN) [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0 (XEN) [<ffff82d0402df160>] S context_switch+0x110/0xa60 (XEN) [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250 (XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0 (XEN) [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30 (XEN) *** Dumping CPU5 host state: *** (XEN) Xen call trace: (XEN) [<ffff82d040223622>] R _spin_lock+0x32/0x40 (XEN) [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0 (XEN) [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0 (XEN) [<ffff82d0402df160>] S context_switch+0x110/0xa60 (XEN) [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250 (XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0 (XEN) [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30 (XEN) *** Dumping CPU6 host state: *** (XEN) Xen call trace: (XEN) [<ffff82d040223625>] R _spin_lock+0x35/0x40 (XEN) [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0 (XEN) [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0 (XEN) [<ffff82d0402df160>] S context_switch+0x110/0xa60 (XEN) [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250 (XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0 (XEN) [<ffff82d0402d64e6>] S x86_64/entry.S#process_softirqs+0x6/0x20 (XEN) *** Dumping CPU7 host state: *** (XEN) Xen call trace: (XEN) [<ffff82d040223625>] R _spin_lock+0x35/0x40 (XEN) [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0 (XEN) [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0 (XEN) [<ffff82d0402df160>] S context_switch+0x110/0xa60 (XEN) [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250 (XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0 (XEN) [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30 And so on, for (almost?) all CPUs. Note the '*' output is (I think) from a different instances of the freeze, so cannot be correlated with other outputs... > Maybe they're stuck in the kernel, not in Xen? Thoughs ? Given the above spin locks, I don't think so. But also, even if they are stuck in the kernel, it clearly happened after 4.13 -> 4.14 upgrade... > (XEN) Domain info: > (XEN) Domain: 0 w 256 c 0 v 16 > (XEN) 1: [0.0] flags=0 cpu=5 credit=10553147 [w=256] load=17122 > (~6%) > (XEN) 2: [0.1] flags=0 cpu=4 credit=10570606 [w=256] load=13569 > (~5%) > (XEN) 3: [0.2] flags=0 cpu=7 credit=10605188 [w=256] load=13465 > (~5%) > (XEN) 4: [0.3] flags=2 cpu=11 credit=9998469 [w=256] load=262144 > (~100%) > (XEN) 5: [0.4] flags=0 cpu=0 credit=10533686 [w=256] load=13619 > (~5%) > (XEN) 6: [0.5] flags=a cpu=9 credit=1101 [w=256] load=0 (~0%) > (XEN) 7: [0.6] flags=2 cpu=2 credit=10621802 [w=256] load=13526 > (~5%) > (XEN) 8: [0.7] flags=2 cpu=1 credit=10670607 [w=256] load=13453 > (~5%) > (XEN) 9: [0.8] flags=2 cpu=7 credit=10649858 [w=256] load=13502 > (~5%) > (XEN) 10: [0.9] flags=0 cpu=3 credit=10550566 [w=256] load=13477 > (~5%) > (XEN) 11: [0.10] flags=2 cpu=4 credit=10644321 [w=256] load=13539 > (~5%) > (XEN) 12: [0.11] flags=2 cpu=1 credit=10602374 [w=256] load=13471 > (~5%) > (XEN) 13: [0.12] flags=0 cpu=6 credit=10617262 [w=256] load=13801 > (~5%) > (XEN) 14: [0.13] flags=2 cpu=8 credit=9998664 [w=256] load=262144 > (~100%) > (XEN) 15: [0.14] flags=0 cpu=3 credit=10603305 [w=256] load=17020 > (~6%) > (XEN) 16: [0.15] flags=0 cpu=5 credit=10591312 [w=256] load=13523 > (~5%) > (XEN) Domain: 1 w 256 c 0 v 2 > (XEN) 17: [1.0] flags=2 cpu=6 credit=4916769 [w=256] load=262144 > (~100%) > (XEN) 18: [1.1] flags=0 cpu=13 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 2 w 256 c 0 v 2 > (XEN) 19: [2.0] flags=2 cpu=5 credit=4982064 [w=256] load=262144 > (~100%) > (XEN) 20: [2.1] flags=0 cpu=14 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 3 w 256 c 0 v 2 > (XEN) 21: [3.0] flags=2 cpu=1 credit=5200781 [w=256] load=262144 > (~100%) > (XEN) 22: [3.1] flags=0 cpu=5 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 4 w 256 c 0 v 2 > (XEN) 23: [4.0] flags=12 cpu=0 credit=5395149 [w=256] load=262144 > (~100%) > (XEN) 24: [4.1] flags=0 cpu=8 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 5 w 256 c 0 v 2 > (XEN) 25: [5.0] flags=2 cpu=2 credit=5306461 [w=256] load=262144 > (~100%) > (XEN) 26: [5.1] flags=0 cpu=15 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 6 w 256 c 0 v 8 > (XEN) 27: [6.0] flags=12 cpu=10 credit=7915602 [w=256] load=262144 > (~100%) > (XEN) 28: [6.1] flags=0 cpu=10 credit=10500000 [w=256] load=0 (~0%) > (XEN) 29: [6.2] flags=0 cpu=15 credit=10500000 [w=256] load=0 (~0%) > (XEN) 30: [6.3] flags=0 cpu=8 credit=10500000 [w=256] load=0 (~0%) > (XEN) 31: [6.4] flags=0 cpu=0 credit=10500000 [w=256] load=0 (~0%) > (XEN) 32: [6.5] flags=0 cpu=6 credit=10500000 [w=256] load=0 (~0%) > (XEN) 33: [6.6] flags=0 cpu=11 credit=10500000 [w=256] load=0 (~0%) > (XEN) 34: [6.7] flags=0 cpu=9 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 7 w 256 c 0 v 2 > (XEN) 35: [7.0] flags=2 cpu=4 credit=5297013 [w=256] load=262144 > (~100%) > (XEN) 36: [7.1] flags=0 cpu=11 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 8 w 256 c 0 v 4 > (XEN) 37: [8.0] flags=0 cpu=14 credit=10500000 [w=256] load=0 (~0%) > (XEN) 38: [8.1] flags=2 cpu=7 credit=5240630 [w=256] load=262144 > (~100%) > (XEN) 39: [8.2] flags=0 cpu=13 credit=10500000 [w=256] load=0 (~0%) > (XEN) 40: [8.3] flags=0 cpu=8 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 9 w 256 c 0 v 2 > (XEN) 41: [9.0] flags=0 cpu=0 credit=10500000 [w=256] load=0 (~0%) > (XEN) 42: [9.1] flags=12 cpu=13 credit=7910266 [w=256] load=262144 > (~100%) > (XEN) Domain: 10 w 256 c 0 v 2 > (XEN) 43: [10.0] flags=12 cpu=12 credit=8045458 [w=256] load=262144 > (~100%) > (XEN) 44: [10.1] flags=0 cpu=8 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 11 w 256 c 0 v 2 > (XEN) 45: [11.0] flags=12 cpu=14 credit=7575284 [w=256] load=262144 > (~100%) > (XEN) 46: [11.1] flags=0 cpu=12 credit=10500000 [w=256] load=0 > (~0%) > (XEN) Domain: 12 w 256 c 0 v 2 > (XEN) 47: [12.0] flags=2 cpu=15 credit=8014099 [w=256] load=262144 > (~100%) > (XEN) 48: [12.1] flags=0 cpu=6 credit=10500000 [w=256] load=0 (~0%) > (XEN) Domain: 13 w 256 c 0 v 2 > (XEN) 49: [13.0] flags=0 cpu=7 credit=10500000 [w=256] load=0 (~0%) > (XEN) 50: [13.1] flags=0 cpu=15 credit=10500000 [w=256] load=0 > (~0%) > (XEN) Runqueue 0: > (XEN) CPU[00] runq=0, sibling={0}, core={0-7} > (XEN) run: [4.0] flags=2 cpu=0 credit=5255200 [w=256] load=262144 > (~100%) > (XEN) CPU[01] runq=0, sibling={1}, core={0-7} > (XEN) run: [3.0] flags=2 cpu=1 credit=5057668 [w=256] load=262144 > (~100%) > (XEN) CPU[02] runq=0, sibling={2}, core={0-7} > (XEN) run: [5.0] flags=12 cpu=2 credit=5180785 [w=256] load=262144 > (~100%) > (XEN) CPU[03] runq=0, sibling={3}, core={0-7} > (XEN) CPU[04] runq=0, sibling={4}, core={0-7} > (XEN) run: [7.0] flags=2 cpu=4 credit=5215323 [w=256] load=262144 > (~100%) > (XEN) CPU[05] runq=0, sibling={5}, core={0-7} > (XEN) run: [2.0] flags=2 cpu=5 credit=4816142 [w=256] load=262144 > (~100%) > (XEN) CPU[06] runq=0, sibling={6}, core={0-7} > (XEN) run: [1.0] flags=2 cpu=6 credit=4755772 [w=256] load=262144 > (~100%) > (XEN) CPU[07] runq=0, sibling={7}, core={0-7} > (XEN) run: [8.1] flags=12 cpu=7 credit=5175342 [w=256] load=262144 > (~100%) > (XEN) RUNQ: > (XEN) Runqueue 1: > (XEN) CPU[08] runq=1, sibling={8}, core={8-15} > (XEN) run: [0.13] flags=2 cpu=8 credit=9998664 [w=256] load=262144 > (~100%) > (XEN) CPU[09] runq=1, sibling={9}, core={8-15} > (XEN) run: [0.5] flags=a cpu=9 credit=1101 [w=256] load=0 (~0%) > (XEN) CPU[10] runq=1, sibling={10}, core={8-15} > (XEN) run: [6.0] flags=2 cpu=10 credit=7764532 [w=256] load=262144 > (~100%) > (XEN) CPU[11] runq=1, sibling={11}, core={8-15} > (XEN) run: [0.3] flags=2 cpu=11 credit=9998469 [w=256] load=262144 > (~100%) > (XEN) CPU[12] runq=1, sibling={12}, core={8-15} > (XEN) run: [10.0] flags=2 cpu=12 credit=7967846 [w=256] load=262144 > (~100%) > (XEN) CPU[13] runq=1, sibling={13}, core={8-15} > (XEN) run: [9.1] flags=12 cpu=13 credit=7832232 [w=256] load=262144 > (~100%) > (XEN) CPU[14] runq=1, sibling={14}, core={8-15} > (XEN) run: [11.0] flags=2 cpu=14 credit=7509378 [w=256] load=262144 > (~100%) > (XEN) CPU[15] runq=1, sibling={15}, core={8-15} > (XEN) run: [12.0] flags=2 cpu=15 credit=7971164 [w=256] load=262144 > (~100%) > (XEN) RUNQ: > (XEN) CPUs info: > (XEN) CPU[00] current=d4v0, curr=d4v0, prev=NULL > (XEN) CPU[01] current=d3v0, curr=d3v0, prev=NULL > (XEN) CPU[02] current=d5v0, curr=d5v0, prev=NULL > (XEN) CPU[03] current=d[IDLE]v3, curr=d[IDLE]v3, prev=NULL > (XEN) CPU[04] current=d7v0, curr=d7v0, prev=NULL > (XEN) CPU[05] current=d2v0, curr=d2v0, prev=NULL > (XEN) CPU[06] current=d1v0, curr=d1v0, prev=NULL > (XEN) CPU[07] current=d8v1, curr=d8v1, prev=NULL > (XEN) CPU[08] current=d0v13, curr=d0v13, prev=NULL > (XEN) CPU[09] current=d0v5, curr=d0v5, prev=NULL > (XEN) CPU[10] current=d6v0, curr=d6v0, prev=NULL > (XEN) CPU[11] current=d0v3, curr=d0v3, prev=NULL > (XEN) CPU[12] current=d10v0, curr=d10v0, prev=NULL > (XEN) CPU[13] current=d9v1, curr=d9v1, prev=NULL > (XEN) CPU[14] current=d11v0, curr=d11v0, prev=NULL > (XEN) CPU[15] current=d12v0, curr=d12v0, prev=NULL > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? Attachment:
signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |