[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling
On 28/03/2019 14:33, Julien Grall wrote: > Hi, > > On 3/28/19 1:01 PM, Volodymyr Babchuk wrote: >> Hello Juergen, >> >> On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote: >>> >>> Especially in the scheduler area (schedule.c, cpupool.c) there is a >>> rather complex handling involved when doing suspend and resume. >>> >>> This can be simplified a lot by not performing a complete cpu down and >>> up cycle for the non-boot cpus, but keeping the pure software related >>> state and freeing it only in case a cpu didn't come up again during >>> resume. >>> >>> In summary not only the complexity can be reduced, but the failure >>> tolerance will be even better with this series: With a dedicated hook >>> for failing cpus when resuming it is now possible to survive e.g. a >>> cpupool being left without any cpu after resume by moving its domains >>> to cpupool0. >>> >>> Juergen Gross (6): >>> xen/sched: call cpu_disable_scheduler() via cpu notifier >>> xen: add helper for calling notifier_call_chain() to common/cpu.c >>> xen: add new cpu notifier action CPU_RESUME_FAILED >>> xen: don't free percpu areas during suspend >>> xen/cpupool: simplify suspend/resume handling >>> xen/sched: don't disable scheduler on cpus during suspend >>> >>> xen/arch/arm/smpboot.c | 4 - >>> xen/arch/x86/percpu.c | 3 +- >>> xen/arch/x86/smpboot.c | 3 - >>> xen/common/cpu.c | 61 +++++++------- >>> xen/common/cpupool.c | 131 ++++++++++++----------------- >>> xen/common/schedule.c | 203 >>> +++++++++++++++++++-------------------------- >>> xen/include/xen/cpu.h | 29 ++++--- >>> xen/include/xen/sched-if.h | 1 - >>> 8 files changed, 190 insertions(+), 245 deletions(-) >>> >> >> I tested your patch series on ARM64 platform. We had issue with hard >> affinity - there was assertion failure in sched_credit2 code during >> suspension if one of the vCPUs is pinned to non-0 pCPU. > When you report an error, please make clear what commit you are using > and whether you have patches applied on top. > > In this case, we have no support of suspend/resume on Arm today. So bug > report around suspend/resume is a bit confusing to have. It is also more > difficult to help when you don't have the full picture as a bug may be > in your code and upstream Xen. > > I saw Juergen suggested a fix, please carry it in whatever series you have. > >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) PSCI cpu off failed for CPU0 err=-3 >> (XEN) **************************************** > > PSCI CPU off failing is never a good news. Here, the command has been > denied by PSCI monitor. But... why does CPU off is actually called on > CPU0? Shouldn't we have turned off the platform instead? Could it be that a scheduler lock is no longer reachable as the percpu memory of another cpu has been released and allocated again? That would be one of the possible results of my series. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |