[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling



Hello Julien,

On Thu, 28 Mar 2019 at 15:33, Julien Grall <julien.grall@xxxxxxx> wrote:
>
> Hi,
>
> On 3/28/19 1:01 PM, Volodymyr Babchuk wrote:
> > Hello Juergen,
> >
> > On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote:
> >>
> >> Especially in the scheduler area (schedule.c, cpupool.c) there is a
> >> rather complex handling involved when doing suspend and resume.
> >>
> >> This can be simplified a lot by not performing a complete cpu down and
> >> up cycle for the non-boot cpus, but keeping the pure software related
> >> state and freeing it only in case a cpu didn't come up again during
> >> resume.
> >>
> >> In summary not only the complexity can be reduced, but the failure
> >> tolerance will be even better with this series: With a dedicated hook
> >> for failing cpus when resuming it is now possible to survive e.g. a
> >> cpupool being left without any cpu after resume by moving its domains
> >> to cpupool0.
> >>
> >> Juergen Gross (6):
> >>    xen/sched: call cpu_disable_scheduler() via cpu notifier
> >>    xen: add helper for calling notifier_call_chain() to common/cpu.c
> >>    xen: add new cpu notifier action CPU_RESUME_FAILED
> >>    xen: don't free percpu areas during suspend
> >>    xen/cpupool: simplify suspend/resume handling
> >>    xen/sched: don't disable scheduler on cpus during suspend
> >>
> >>   xen/arch/arm/smpboot.c     |   4 -
> >>   xen/arch/x86/percpu.c      |   3 +-
> >>   xen/arch/x86/smpboot.c     |   3 -
> >>   xen/common/cpu.c           |  61 +++++++-------
> >>   xen/common/cpupool.c       | 131 ++++++++++++-----------------
> >>   xen/common/schedule.c      | 203 
> >> +++++++++++++++++++--------------------------
> >>   xen/include/xen/cpu.h      |  29 ++++---
> >>   xen/include/xen/sched-if.h |   1 -
> >>   8 files changed, 190 insertions(+), 245 deletions(-)
> >>
> >
> > I tested your patch series on ARM64 platform. We had issue with hard
> > affinity - there was assertion failure in sched_credit2 code during
> > suspension if one of the vCPUs is pinned to non-0 pCPU.
> When you report an error, please make clear what commit you are using
> and whether you have patches applied on top.

Sure.

> In this case, we have no support of suspend/resume on Arm today. So bug
> report around suspend/resume is a bit confusing to have. It is also more
> difficult to help when you don't have the full picture as a bug may be
> in your code and upstream Xen.

Agree. But in this case, changes were done to the common code mostly.
I assumed that it would be good to check and report it for Arm, even
if Arm suspend/resume is not upstreamed yet. Besides, this patch
series fixed another issue in the common suspend/resume code.

> I saw Juergen suggested a fix, please carry it in whatever series you have.
Yes, this patch fixes the issue.

We are using patch series by Mirela, that you mentioned earlier, by the way.

> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) PSCI cpu off failed for CPU0 err=-3
> > (XEN) ****************************************
>
> PSCI CPU off failing is never a good news. Here, the command has been
> denied by PSCI monitor. But... why does CPU off is actually called on
> CPU0? Shouldn't we have turned off the platform instead?
I think, this is because CPU1 is performing machine_restart(), so it
asked CPU0 to halt itself.

>
> > (XEN)
> > (XEN) Reboot in five seconds...
>
> Are the logs below actually a mistaken paste?
No, this is what I'm seeing in my serial console.

> > (XEN) CPU2 will call ARM_SMCCC_ARCH_WORKAROUND_1 on exception entry
> > (XEN) CPU 2 booted.
> > (XEN) Data Abort Trap. Syndrome=0x6
> > (XEN) Walking Hypervisor VA 0x0 on CPU2 via TTBR 0x00000000781a8000
> > (XEN) 0TH[0x0] = 0x00000000781b0f7f
> > (XEN) 1ST[0x0] = 0x00000000781aaf7f
> > (XEN) 2ND[0x0] = 0x0000000000000000
> > (XEN) CPU2: Unexpected Trap: Data Abort
> > (XEN) ----[ Xen-4.13-unstable  arm64  debug=y   Not tainted ]----
> > (XEN) CPU:    2
> > (XEN) PC:     0000000000233660 _spin_lock+0x1c/0x88
> > (XEN) LR:     000000000023365c
> > (XEN) SP:     000080037ff77d50
> > (XEN) CPSR:   a00002c9 MODE:64-bit EL2h (Hypervisor, handler)
> > (XEN)      X0: 0000000000000006  X1: 00000000fffffffe  X2: 0000000000000000
> > (XEN)      X3: 0000000000000002  X4: 000080037fc42480  X5: 0000000000000000
> > (XEN)      X6: 0000000000000080  X7: 000080037ffb0000  X8: 00000000002a1000
> > (XEN)      X9: 000000000000000a X10: 000080037ff77bf8 X11: 0000000000000032
> > (XEN)     X12: 0000000000000001 X13: 000000000027fff0 X14: 0000000000000020
> > (XEN)     X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
> > (XEN)     X18: 0000000000000000 X19: 0000000000000000 X20: 0000000000000000
> > (XEN)     X21: 000080037ff7e108 X22: 0000000000000002 X23: 000000000033bc88
> > (XEN)     X24: 0000000000336020 X25: 0000000000000000 X26: 0000000000000002
> > (XEN)     X27: 0000000000336000 X28: 0000000000000000  FP: 000080037ff77d50
> > (XEN)
> > (XEN)   VTCR_EL2: 80023558
> > (XEN)  VTTBR_EL2: 0000000000000000
> > (XEN)
> > (XEN)  SCTLR_EL2: 30cd183d
> > (XEN)    HCR_EL2: 0000000000000038
> > (XEN)  TTBR0_EL2: 00000000781a8000
> > (XEN)
> > (XEN)    ESR_EL2: 96000006
> > (XEN)  HPFAR_EL2: 0000000000000000
> > (XEN)    FAR_EL2: 0000000000000000
> > (XEN)
> > (XEN) Xen stack trace from sp=000080037ff77d50:
> > (XEN)    000080037ff77d70 00000000002336e8 000080037ff7d000 000000000023e00c
> > (XEN)    000080037ff77d80 000000000022e90c 000080037ff77e10 0000000000232af8
> > (XEN)    0000000000000002 00000000002fbb00 ffffffffffffffff 000000000033cf20
> > (XEN)    00000000002a0680 0000000000000001 0000000000000001 0000000000000001
> > (XEN)    0000000000000000 000080037ff77e90 000080037ff77e50 00000000ffffffc8
> > (XEN)    000000000029f008 00000000002ffc41 000080037ff77e90 0000000000263c68
> > (XEN)    000080037ff77e50 0000000000232b6c 0000000000000002 0000000000000004
> > (XEN)    0000000000000002 00000000002fbc00 0000000000336448 00000000002fbb00
> > (XEN)    000080037ff77e60 0000000000257230 000080037ff77e90 0000000000263c6c
> > (XEN)    0000000000000002 0000000077e80000 0000000000000000 0000000000000001
> > (XEN)    0000000000000000 0000000000000002 0000000000000001 effffffffffaffff
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000
> > (XEN) Xen call trace:
> > (XEN)    [<0000000000233660>] _spin_lock+0x1c/0x88 (PC)
> > (XEN)    [<000000000023365c>] _spin_lock+0x18/0x88 (LR)
> > (XEN)    [<00000000002336e8>] _spin_lock_irq+0x1c/0x24
> > (XEN)    [<000000000022e90c>] schedule.c#schedule+0xe8/0x74c
> > (XEN)    [<0000000000232af8>] softirq.c#__do_softirq+0xcc/0xe4
> > (XEN)    [<0000000000232b6c>] do_softirq+0x14/0x1c
> > (XEN)    [<0000000000257230>] idle_loop+0x174/0x188
> > (XEN)    [<0000000000263c6c>] start_secondary+0x1f4/0x200
> > (XEN)    [<0000000000000002>] 0000000000000002
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 2:
> > (XEN) CPU2: Unexpected Trap: Data Abort
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Reboot in five seconds...
>
> Cheers,
>
> --
> Julien Grall



-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@xxxxxxxxx

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.