[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling



Hello Juergen,

On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote:
>
> Especially in the scheduler area (schedule.c, cpupool.c) there is a
> rather complex handling involved when doing suspend and resume.
>
> This can be simplified a lot by not performing a complete cpu down and
> up cycle for the non-boot cpus, but keeping the pure software related
> state and freeing it only in case a cpu didn't come up again during
> resume.
>
> In summary not only the complexity can be reduced, but the failure
> tolerance will be even better with this series: With a dedicated hook
> for failing cpus when resuming it is now possible to survive e.g. a
> cpupool being left without any cpu after resume by moving its domains
> to cpupool0.
>
> Juergen Gross (6):
>   xen/sched: call cpu_disable_scheduler() via cpu notifier
>   xen: add helper for calling notifier_call_chain() to common/cpu.c
>   xen: add new cpu notifier action CPU_RESUME_FAILED
>   xen: don't free percpu areas during suspend
>   xen/cpupool: simplify suspend/resume handling
>   xen/sched: don't disable scheduler on cpus during suspend
>
>  xen/arch/arm/smpboot.c     |   4 -
>  xen/arch/x86/percpu.c      |   3 +-
>  xen/arch/x86/smpboot.c     |   3 -
>  xen/common/cpu.c           |  61 +++++++-------
>  xen/common/cpupool.c       | 131 ++++++++++++-----------------
>  xen/common/schedule.c      | 203 
> +++++++++++++++++++--------------------------
>  xen/include/xen/cpu.h      |  29 ++++---
>  xen/include/xen/sched-if.h |   1 -
>  8 files changed, 190 insertions(+), 245 deletions(-)
>

I tested your patch series on ARM64 platform. We had issue with hard
affinity - there was assertion failure in sched_credit2 code during
suspension if one of the vCPUs is pinned to non-0 pCPU.

Seems, your patch series fixes the issue during suspend. But now I'm
seeing crash during resume:


(XEN) suspend.c:198: Resume
(XEN) Enabling non-boot CPUs  ...
(XEN) Bringing up CPU1
(XEN) CPU1 will call ARM_SMCCC_ARCH_WORKAROUND_1 on exception entry
(XEN) CPU 1 booted.
(XEN) Bringing up CPU2
(XEN) Data Abort Trap. Syndrome=0x6
(XEN) Walking Hypervisor VA 0x0 on CPU1 via TTBR 0x00000000781a8000
(XEN) 0TH[0x0] = 0x00000000781b0f7f
(XEN) 1ST[0x0] = 0x00000000781aaf7f
(XEN) 2ND[0x0] = 0x0000000000000000
(XEN) CPU1: Unexpected Trap: Data Abort
(XEN) ----[ Xen-4.13-unstable  arm64  debug=y   Not tainted ]----
(XEN) CPU:    1
(XEN) PC:     0000000000233660 _spin_lock+0x1c/0x88
(XEN) LR:     000000000023365c
(XEN) SP:     000080037ffc7d50
(XEN) CPSR:   600002c9 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000006  X1: 0000000000000000  X2: 0000000000000000
(XEN)      X3: 0000000000000002  X4: 000080037fc94480  X5: 0000000000000000
(XEN)      X6: 0000000000000080  X7: 000080037ffb0000  X8: 00000000002a1000
(XEN)      X9: 000000000000000a X10: 000080037ffc7bf8 X11: 0000000000000031
(XEN)     X12: 0000000000000001 X13: 000000000027fff0 X14: 0000000000000020
(XEN)     X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
(XEN)     X18: 0000000000000000 X19: 0000000000000000 X20: 0000000000000000
(XEN)     X21: 000080037ffd0108 X22: 0000000000000001 X23: 000000000033bc88
(XEN)     X24: 0000000000336020 X25: 0000000000000000 X26: 0000000000000001
(XEN)     X27: 0000000000336000 X28: 0000000000000000  FP: 000080037ffc7d50
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0000000000000000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 0000000000000038
(XEN)  TTBR0_EL2: 00000000781a8000
(XEN)
(XEN)    ESR_EL2: 96000006
(XEN)  HPFAR_EL2: 0000000000000000
(XEN)    FAR_EL2: 0000000000000000
(XEN)
(XEN) Xen stack trace from sp=000080037ffc7d50:
(XEN)    000080037ffc7d70 00000000002336e8 000080037ffd2000 000000000023e00c
(XEN)    000080037ffc7d80 000000000022e90c 000080037ffc7e10 0000000000232af8
(XEN)    0000000000000001 00000000002fbb00 ffffffffffffffff 000000000033cf20
(XEN)    00000000002a0680 0000000000000001 0000000000000001 0000000000000001
(XEN)    0000000000000000 000080037ffc7e90 000080037ffc7e50 00000000ffffffc8
(XEN)    000000000029f008 00000000002ffc41 000080037ffc7e90 0000000000263c68
(XEN)    000080037ffc7e50 0000000000232b6c 0000000000000001 0000000000000002
(XEN)    0000000000000001 00000000002fbb80 0000000000336448 00000000002fbb00
(XEN)    000080037ffc7e60 0000000000257230 000080037ffc7e90 0000000000263c6c
(XEN)    0000000000000001 0000000077e80000 0000000000000000 0000000000000001
(XEN)    0000000000000000 0000000000000001 0000000000000001 0000ffff0000ffff
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000233660>] _spin_lock+0x1c/0x88 (PC)
(XEN)    [<000000000023365c>] _spin_lock+0x18/0x88 (LR)
(XEN)    [<00000000002336e8>] _spin_lock_irq+0x1c/0x24
(XEN)    [<000000000022e90c>] schedule.c#schedule+0xe8/0x74c
(XEN)    [<0000000000232af8>] softirq.c#__do_softirq+0xcc/0xe4
(XEN)    [<0000000000232b6c>] do_softirq+0x14/0x1c
(XEN)    [<0000000000257230>] idle_loop+0x174/0x188
(XEN)    [<0000000000263c6c>] start_secondary+0x1f4/0x200
(XEN)    [<0000000000000001>] 0000000000000001
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) CPU1: Unexpected Trap: Data Abort
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) PSCI cpu off failed for CPU0 err=-3
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) CPU2 will call ARM_SMCCC_ARCH_WORKAROUND_1 on exception entry
(XEN) CPU 2 booted.
(XEN) Data Abort Trap. Syndrome=0x6
(XEN) Walking Hypervisor VA 0x0 on CPU2 via TTBR 0x00000000781a8000
(XEN) 0TH[0x0] = 0x00000000781b0f7f
(XEN) 1ST[0x0] = 0x00000000781aaf7f
(XEN) 2ND[0x0] = 0x0000000000000000
(XEN) CPU2: Unexpected Trap: Data Abort
(XEN) ----[ Xen-4.13-unstable  arm64  debug=y   Not tainted ]----
(XEN) CPU:    2
(XEN) PC:     0000000000233660 _spin_lock+0x1c/0x88
(XEN) LR:     000000000023365c
(XEN) SP:     000080037ff77d50
(XEN) CPSR:   a00002c9 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000006  X1: 00000000fffffffe  X2: 0000000000000000
(XEN)      X3: 0000000000000002  X4: 000080037fc42480  X5: 0000000000000000
(XEN)      X6: 0000000000000080  X7: 000080037ffb0000  X8: 00000000002a1000
(XEN)      X9: 000000000000000a X10: 000080037ff77bf8 X11: 0000000000000032
(XEN)     X12: 0000000000000001 X13: 000000000027fff0 X14: 0000000000000020
(XEN)     X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
(XEN)     X18: 0000000000000000 X19: 0000000000000000 X20: 0000000000000000
(XEN)     X21: 000080037ff7e108 X22: 0000000000000002 X23: 000000000033bc88
(XEN)     X24: 0000000000336020 X25: 0000000000000000 X26: 0000000000000002
(XEN)     X27: 0000000000336000 X28: 0000000000000000  FP: 000080037ff77d50
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0000000000000000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 0000000000000038
(XEN)  TTBR0_EL2: 00000000781a8000
(XEN)
(XEN)    ESR_EL2: 96000006
(XEN)  HPFAR_EL2: 0000000000000000
(XEN)    FAR_EL2: 0000000000000000
(XEN)
(XEN) Xen stack trace from sp=000080037ff77d50:
(XEN)    000080037ff77d70 00000000002336e8 000080037ff7d000 000000000023e00c
(XEN)    000080037ff77d80 000000000022e90c 000080037ff77e10 0000000000232af8
(XEN)    0000000000000002 00000000002fbb00 ffffffffffffffff 000000000033cf20
(XEN)    00000000002a0680 0000000000000001 0000000000000001 0000000000000001
(XEN)    0000000000000000 000080037ff77e90 000080037ff77e50 00000000ffffffc8
(XEN)    000000000029f008 00000000002ffc41 000080037ff77e90 0000000000263c68
(XEN)    000080037ff77e50 0000000000232b6c 0000000000000002 0000000000000004
(XEN)    0000000000000002 00000000002fbc00 0000000000336448 00000000002fbb00
(XEN)    000080037ff77e60 0000000000257230 000080037ff77e90 0000000000263c6c
(XEN)    0000000000000002 0000000077e80000 0000000000000000 0000000000000001
(XEN)    0000000000000000 0000000000000002 0000000000000001 effffffffffaffff
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000233660>] _spin_lock+0x1c/0x88 (PC)
(XEN)    [<000000000023365c>] _spin_lock+0x18/0x88 (LR)
(XEN)    [<00000000002336e8>] _spin_lock_irq+0x1c/0x24
(XEN)    [<000000000022e90c>] schedule.c#schedule+0xe8/0x74c
(XEN)    [<0000000000232af8>] softirq.c#__do_softirq+0xcc/0xe4
(XEN)    [<0000000000232b6c>] do_softirq+0x14/0x1c
(XEN)    [<0000000000257230>] idle_loop+0x174/0x188
(XEN)    [<0000000000263c6c>] start_secondary+0x1f4/0x200
(XEN)    [<0000000000000002>] 0000000000000002
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) CPU2: Unexpected Trap: Data Abort
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...





-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@xxxxxxxxx

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.