[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling
Hello Julien, On Thu, 28 Mar 2019 at 15:33, Julien Grall <julien.grall@xxxxxxx> wrote: > > Hi, > > On 3/28/19 1:01 PM, Volodymyr Babchuk wrote: > > Hello Juergen, > > > > On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote: > >> > >> Especially in the scheduler area (schedule.c, cpupool.c) there is a > >> rather complex handling involved when doing suspend and resume. > >> > >> This can be simplified a lot by not performing a complete cpu down and > >> up cycle for the non-boot cpus, but keeping the pure software related > >> state and freeing it only in case a cpu didn't come up again during > >> resume. > >> > >> In summary not only the complexity can be reduced, but the failure > >> tolerance will be even better with this series: With a dedicated hook > >> for failing cpus when resuming it is now possible to survive e.g. a > >> cpupool being left without any cpu after resume by moving its domains > >> to cpupool0. > >> > >> Juergen Gross (6): > >> xen/sched: call cpu_disable_scheduler() via cpu notifier > >> xen: add helper for calling notifier_call_chain() to common/cpu.c > >> xen: add new cpu notifier action CPU_RESUME_FAILED > >> xen: don't free percpu areas during suspend > >> xen/cpupool: simplify suspend/resume handling > >> xen/sched: don't disable scheduler on cpus during suspend > >> > >> xen/arch/arm/smpboot.c | 4 - > >> xen/arch/x86/percpu.c | 3 +- > >> xen/arch/x86/smpboot.c | 3 - > >> xen/common/cpu.c | 61 +++++++------- > >> xen/common/cpupool.c | 131 ++++++++++++----------------- > >> xen/common/schedule.c | 203 > >> +++++++++++++++++++-------------------------- > >> xen/include/xen/cpu.h | 29 ++++--- > >> xen/include/xen/sched-if.h | 1 - > >> 8 files changed, 190 insertions(+), 245 deletions(-) > >> > > > > I tested your patch series on ARM64 platform. We had issue with hard > > affinity - there was assertion failure in sched_credit2 code during > > suspension if one of the vCPUs is pinned to non-0 pCPU. > When you report an error, please make clear what commit you are using > and whether you have patches applied on top. Sure. > In this case, we have no support of suspend/resume on Arm today. So bug > report around suspend/resume is a bit confusing to have. It is also more > difficult to help when you don't have the full picture as a bug may be > in your code and upstream Xen. Agree. But in this case, changes were done to the common code mostly. I assumed that it would be good to check and report it for Arm, even if Arm suspend/resume is not upstreamed yet. Besides, this patch series fixed another issue in the common suspend/resume code. > I saw Juergen suggested a fix, please carry it in whatever series you have. Yes, this patch fixes the issue. We are using patch series by Mirela, that you mentioned earlier, by the way. > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) PSCI cpu off failed for CPU0 err=-3 > > (XEN) **************************************** > > PSCI CPU off failing is never a good news. Here, the command has been > denied by PSCI monitor. But... why does CPU off is actually called on > CPU0? Shouldn't we have turned off the platform instead? I think, this is because CPU1 is performing machine_restart(), so it asked CPU0 to halt itself. > > > (XEN) > > (XEN) Reboot in five seconds... > > Are the logs below actually a mistaken paste? No, this is what I'm seeing in my serial console. > > (XEN) CPU2 will call ARM_SMCCC_ARCH_WORKAROUND_1 on exception entry > > (XEN) CPU 2 booted. > > (XEN) Data Abort Trap. Syndrome=0x6 > > (XEN) Walking Hypervisor VA 0x0 on CPU2 via TTBR 0x00000000781a8000 > > (XEN) 0TH[0x0] = 0x00000000781b0f7f > > (XEN) 1ST[0x0] = 0x00000000781aaf7f > > (XEN) 2ND[0x0] = 0x0000000000000000 > > (XEN) CPU2: Unexpected Trap: Data Abort > > (XEN) ----[ Xen-4.13-unstable arm64 debug=y Not tainted ]---- > > (XEN) CPU: 2 > > (XEN) PC: 0000000000233660 _spin_lock+0x1c/0x88 > > (XEN) LR: 000000000023365c > > (XEN) SP: 000080037ff77d50 > > (XEN) CPSR: a00002c9 MODE:64-bit EL2h (Hypervisor, handler) > > (XEN) X0: 0000000000000006 X1: 00000000fffffffe X2: 0000000000000000 > > (XEN) X3: 0000000000000002 X4: 000080037fc42480 X5: 0000000000000000 > > (XEN) X6: 0000000000000080 X7: 000080037ffb0000 X8: 00000000002a1000 > > (XEN) X9: 000000000000000a X10: 000080037ff77bf8 X11: 0000000000000032 > > (XEN) X12: 0000000000000001 X13: 000000000027fff0 X14: 0000000000000020 > > (XEN) X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000 > > (XEN) X18: 0000000000000000 X19: 0000000000000000 X20: 0000000000000000 > > (XEN) X21: 000080037ff7e108 X22: 0000000000000002 X23: 000000000033bc88 > > (XEN) X24: 0000000000336020 X25: 0000000000000000 X26: 0000000000000002 > > (XEN) X27: 0000000000336000 X28: 0000000000000000 FP: 000080037ff77d50 > > (XEN) > > (XEN) VTCR_EL2: 80023558 > > (XEN) VTTBR_EL2: 0000000000000000 > > (XEN) > > (XEN) SCTLR_EL2: 30cd183d > > (XEN) HCR_EL2: 0000000000000038 > > (XEN) TTBR0_EL2: 00000000781a8000 > > (XEN) > > (XEN) ESR_EL2: 96000006 > > (XEN) HPFAR_EL2: 0000000000000000 > > (XEN) FAR_EL2: 0000000000000000 > > (XEN) > > (XEN) Xen stack trace from sp=000080037ff77d50: > > (XEN) 000080037ff77d70 00000000002336e8 000080037ff7d000 000000000023e00c > > (XEN) 000080037ff77d80 000000000022e90c 000080037ff77e10 0000000000232af8 > > (XEN) 0000000000000002 00000000002fbb00 ffffffffffffffff 000000000033cf20 > > (XEN) 00000000002a0680 0000000000000001 0000000000000001 0000000000000001 > > (XEN) 0000000000000000 000080037ff77e90 000080037ff77e50 00000000ffffffc8 > > (XEN) 000000000029f008 00000000002ffc41 000080037ff77e90 0000000000263c68 > > (XEN) 000080037ff77e50 0000000000232b6c 0000000000000002 0000000000000004 > > (XEN) 0000000000000002 00000000002fbc00 0000000000336448 00000000002fbb00 > > (XEN) 000080037ff77e60 0000000000257230 000080037ff77e90 0000000000263c6c > > (XEN) 0000000000000002 0000000077e80000 0000000000000000 0000000000000001 > > (XEN) 0000000000000000 0000000000000002 0000000000000001 effffffffffaffff > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 > > (XEN) Xen call trace: > > (XEN) [<0000000000233660>] _spin_lock+0x1c/0x88 (PC) > > (XEN) [<000000000023365c>] _spin_lock+0x18/0x88 (LR) > > (XEN) [<00000000002336e8>] _spin_lock_irq+0x1c/0x24 > > (XEN) [<000000000022e90c>] schedule.c#schedule+0xe8/0x74c > > (XEN) [<0000000000232af8>] softirq.c#__do_softirq+0xcc/0xe4 > > (XEN) [<0000000000232b6c>] do_softirq+0x14/0x1c > > (XEN) [<0000000000257230>] idle_loop+0x174/0x188 > > (XEN) [<0000000000263c6c>] start_secondary+0x1f4/0x200 > > (XEN) [<0000000000000002>] 0000000000000002 > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 2: > > (XEN) CPU2: Unexpected Trap: Data Abort > > (XEN) **************************************** > > (XEN) > > (XEN) Reboot in five seconds... > > Cheers, > > -- > Julien Grall -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@xxxxxxxxx _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |