[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling
Hi, On 3/28/19 1:01 PM, Volodymyr Babchuk wrote: When you report an error, please make clear what commit you are using and whether you have patches applied on top.Hello Juergen, On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote:Especially in the scheduler area (schedule.c, cpupool.c) there is a rather complex handling involved when doing suspend and resume. This can be simplified a lot by not performing a complete cpu down and up cycle for the non-boot cpus, but keeping the pure software related state and freeing it only in case a cpu didn't come up again during resume. In summary not only the complexity can be reduced, but the failure tolerance will be even better with this series: With a dedicated hook for failing cpus when resuming it is now possible to survive e.g. a cpupool being left without any cpu after resume by moving its domains to cpupool0. Juergen Gross (6): xen/sched: call cpu_disable_scheduler() via cpu notifier xen: add helper for calling notifier_call_chain() to common/cpu.c xen: add new cpu notifier action CPU_RESUME_FAILED xen: don't free percpu areas during suspend xen/cpupool: simplify suspend/resume handling xen/sched: don't disable scheduler on cpus during suspend xen/arch/arm/smpboot.c | 4 - xen/arch/x86/percpu.c | 3 +- xen/arch/x86/smpboot.c | 3 - xen/common/cpu.c | 61 +++++++------- xen/common/cpupool.c | 131 ++++++++++++----------------- xen/common/schedule.c | 203 +++++++++++++++++++-------------------------- xen/include/xen/cpu.h | 29 ++++--- xen/include/xen/sched-if.h | 1 - 8 files changed, 190 insertions(+), 245 deletions(-)I tested your patch series on ARM64 platform. We had issue with hard affinity - there was assertion failure in sched_credit2 code during suspension if one of the vCPUs is pinned to non-0 pCPU. In this case, we have no support of suspend/resume on Arm today. So bug report around suspend/resume is a bit confusing to have. It is also more difficult to help when you don't have the full picture as a bug may be in your code and upstream Xen. I saw Juergen suggested a fix, please carry it in whatever series you have. (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) PSCI cpu off failed for CPU0 err=-3 (XEN) **************************************** PSCI CPU off failing is never a good news. Here, the command has been denied by PSCI monitor. But... why does CPU off is actually called on CPU0? Shouldn't we have turned off the platform instead? (XEN) (XEN) Reboot in five seconds... Are the logs below actually a mistaken paste? (XEN) CPU2 will call ARM_SMCCC_ARCH_WORKAROUND_1 on exception entry (XEN) CPU 2 booted. (XEN) Data Abort Trap. Syndrome=0x6 (XEN) Walking Hypervisor VA 0x0 on CPU2 via TTBR 0x00000000781a8000 (XEN) 0TH[0x0] = 0x00000000781b0f7f (XEN) 1ST[0x0] = 0x00000000781aaf7f (XEN) 2ND[0x0] = 0x0000000000000000 (XEN) CPU2: Unexpected Trap: Data Abort (XEN) ----[ Xen-4.13-unstable arm64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) PC: 0000000000233660 _spin_lock+0x1c/0x88 (XEN) LR: 000000000023365c (XEN) SP: 000080037ff77d50 (XEN) CPSR: a00002c9 MODE:64-bit EL2h (Hypervisor, handler) (XEN) X0: 0000000000000006 X1: 00000000fffffffe X2: 0000000000000000 (XEN) X3: 0000000000000002 X4: 000080037fc42480 X5: 0000000000000000 (XEN) X6: 0000000000000080 X7: 000080037ffb0000 X8: 00000000002a1000 (XEN) X9: 000000000000000a X10: 000080037ff77bf8 X11: 0000000000000032 (XEN) X12: 0000000000000001 X13: 000000000027fff0 X14: 0000000000000020 (XEN) X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000 (XEN) X18: 0000000000000000 X19: 0000000000000000 X20: 0000000000000000 (XEN) X21: 000080037ff7e108 X22: 0000000000000002 X23: 000000000033bc88 (XEN) X24: 0000000000336020 X25: 0000000000000000 X26: 0000000000000002 (XEN) X27: 0000000000336000 X28: 0000000000000000 FP: 000080037ff77d50 (XEN) (XEN) VTCR_EL2: 80023558 (XEN) VTTBR_EL2: 0000000000000000 (XEN) (XEN) SCTLR_EL2: 30cd183d (XEN) HCR_EL2: 0000000000000038 (XEN) TTBR0_EL2: 00000000781a8000 (XEN) (XEN) ESR_EL2: 96000006 (XEN) HPFAR_EL2: 0000000000000000 (XEN) FAR_EL2: 0000000000000000 (XEN) (XEN) Xen stack trace from sp=000080037ff77d50: (XEN) 000080037ff77d70 00000000002336e8 000080037ff7d000 000000000023e00c (XEN) 000080037ff77d80 000000000022e90c 000080037ff77e10 0000000000232af8 (XEN) 0000000000000002 00000000002fbb00 ffffffffffffffff 000000000033cf20 (XEN) 00000000002a0680 0000000000000001 0000000000000001 0000000000000001 (XEN) 0000000000000000 000080037ff77e90 000080037ff77e50 00000000ffffffc8 (XEN) 000000000029f008 00000000002ffc41 000080037ff77e90 0000000000263c68 (XEN) 000080037ff77e50 0000000000232b6c 0000000000000002 0000000000000004 (XEN) 0000000000000002 00000000002fbc00 0000000000336448 00000000002fbb00 (XEN) 000080037ff77e60 0000000000257230 000080037ff77e90 0000000000263c6c (XEN) 0000000000000002 0000000077e80000 0000000000000000 0000000000000001 (XEN) 0000000000000000 0000000000000002 0000000000000001 effffffffffaffff (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [<0000000000233660>] _spin_lock+0x1c/0x88 (PC) (XEN) [<000000000023365c>] _spin_lock+0x18/0x88 (LR) (XEN) [<00000000002336e8>] _spin_lock_irq+0x1c/0x24 (XEN) [<000000000022e90c>] schedule.c#schedule+0xe8/0x74c (XEN) [<0000000000232af8>] softirq.c#__do_softirq+0xcc/0xe4 (XEN) [<0000000000232b6c>] do_softirq+0x14/0x1c (XEN) [<0000000000257230>] idle_loop+0x174/0x188 (XEN) [<0000000000263c6c>] start_secondary+0x1f4/0x200 (XEN) [<0000000000000002>] 0000000000000002 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) CPU2: Unexpected Trap: Data Abort (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |