Xen project Mailing List

Re: [Xen-devel] [RFC v2] xen/arm: Suspend to RAM Support in Xen for ARM

To: Julien Grall <julien.grall@xxxxxxxxxx>

From: Mirela Simonovic <mirela.simonovic@xxxxxxxxxx>

Date: Tue, 17 Apr 2018 14:13:28 +0200

Cc: "Edgar E. Iglesias" <edgar.iglesias@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 17 Apr 2018 12:13:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Julien, On Fri, Jan 26, 2018 at 5:08 PM, Julien Grall <julien.grall@xxxxxxxxxx> wrote: > > > On 24/01/18 17:55, Mirela Simonovic wrote: >> >> Hi Julien, Stefano, > > > Hi Mirela, > >> >> Thank you very much for the feedback! >> >> >> On 01/11/2018 03:00 PM, Julien Grall wrote: >>> >>> Hi Mirela, >>> >>> Thank you for the sending the design document. The general design looks >>> good to me. I have some comments below, but they are more related to the >>> implementation of CPU on/off in Xen. >>> >>> On 22/12/17 17:41, Mirela Simonovic wrote: >>> >>> [...] >>> >>>> +--------------- >>>> +Resuming Guests >>>> +--------------- >>>> + >>>> +Resume of the privileged guest (Dom0) is always following the Xen >>>> resume. >>>> + >>>> +An unprivileged guest shall resume once a device it owns triggers a >>>> wake-up >>>> +interrupt, regardless of whether Xen was suspended when the wake-up >>>> interrupt >>>> +was triggered. If Xen was suspended, it is assumed that Dom0 will be >>>> running >>>> +before the DomU guest starts to resume. The synchronization mechanism >>>> to >>>> +enforce the assumed condition is TBD. >>> >>> >>> Given that all but the non-boot CPU will be offlined. Does the wake-up >>> interrupt always need to target the non-boot CPU? >> >> >> Wake-up interrupt needs to be targeted to the boot pCPU, and the resume >> sequence has to start from the boot pCPU. > > > I assume that wake-up interrupts could belong to a guest. > In that case, the wake-up interrupts will need to be moved to the boot pCPU > on suspend. > > [...] > >>> >>> For instance, you likely need to migrate interrupts that was assigned to >>> the physical CPU (either guest one or Xen one). Though Xen ones might be >>> less a concern because I think they are always assigned to CPU0 at the >>> moment. >> >> >> I would very appreciate more information on this. These kind of scenarios >> can be easily overlooked and I'm not that much experienced with pinning and >> its side effects. >> Lets assume a vCPU is pinned to the non-boot CPU#1. When the guest enables >> an interrupt (interrupt is targeted to the vCPU), would Xen target physical >> interrupt to the GIC CPU interface of pCPU#1 or pCPU#0 or all pCPUs? > > > In your example, the interrupts will target pCPU#1 only. > >> >>> >>> Furthermore, PPI handlers are not removed. Same for any memory allocated >>> (you may loose reference to it because percpu area for that CPU will get >>> freed). I believe get into trouble when the CPU is back online? >> >> >> Yes, I needed to add few fixes into existing code to enable pCPU to come >> back online. I'll submit RFC soon. > > > Thank you! > > [...] > >>> >>> I may have miss other bits, so I would highly recommend to go through the >>> boot code and see what could go wrong. >>> >>> [..] >>> >>>> +Resume Flow >>>> +------------ >>>> +The resume entry point shall be implemented in >>>> +* hyp_resume() in arch/arm/arm64/entry.S >>>> +The very beginning of the resume procedure has to be implemented in >>>> assembly. >>>> +It shall contain the following: >>>> +* Enable the MMU so that the structure containing CPU context which was >>>> saved on >>>> +suspend can be accessed >>>> +* Restore CPU context (to match the values saved on suspend) and return >>>> into C >>>> +* Set the system_state variable to SYS_STATE_resume >>>> +* Restore GIC context >>>> +* Resume timer >>>> +* Enable interrupts >>>> +* Enable non-boot CPUs by calling enable_nonboot_cpus() >>> >>> >>> You would have to be careful on re-enabling the non-CPU. start_secondary >>> is implemented based on the assumption that it will only be called during >>> Xen boot. Some of the code may be part of __init (see cpu_up_send_sgi) or >>> should not be called as it is after boot (e.g check_local_cpu_errata). >>> >>> Another I have in mind is the way VTCR_EL2 is set today (see >>> setup_virt_paging). It is done at boot time, so if you online a CPU >>> afterwards, VTCR_EL2 will not be set correctly. >> >> >> Was there any reason to configure VTCR_EL2 after all CPUs become online? >> >> I fixed this as follows: in start_xen(), the boot CPU calls >> setup_virt_paging() prior to enabling non-boot CPUs. setup_virt_paging() >> configures VTCR_EL2 only for the boot CPU. >> Non-boot CPUs call setup_virt_paging_one() later, from start_secondary(). >> Also, only the boot CPU performs the calculation for how to configure >> VTCR_EL2, non-boot CPUs rely on the calculated value. > > This would not be correct. Imagine a platform with heterogeneous processors > (such as big.LITTLE), each processors may have different set of "features" > (e.g max IPA size supported). You want Xen to use a common set of "features" > that would work on all CPUs. > > To give an example, your boot CPU may support maximum 48-bit IPA while all > the other CPUs would support maximum 40-bit IPA. If Xen decides to use > maximum 48-bit IPA, then page-tables would not work on other CPUs. > > In order to take the decision, you need to wait all CPUs to come up and look > at their ID registers. Once the decision is made, then you can configure > correctly VTCR_EL2. > > This is why setup_virt_paging() is called after all the CPUs have booted. > > Obviously, this does not work for CPUs brought up afterwards (e.g resume > case). For those CPUs we should call setup_virt_paging_one directly and > check that the value chosen by Xen can be handled by the processor. If not, > this CPU should be parked. > Could you please clarify "this does not work for CPUs brought up afterwards (e.g resume case)"? Each CPU that is brought up on resume use to be brought up for the first time somewhere in the past before the suspend was triggered. That moment is the place where the CPU could be parked if the config value chosen by Xen cannot be handled by that CPU. I don't see what could possibly change in suspend/resume path that would require this check to be performed again. Each CPU that has suspended is compliant with the VTCR_EL2 chosen by Xen. Please let me know if I misunderstood something. Thanks, Mirela > I think you can use (system_state > SYS_STATE_active) to differentiate > between CPUs brought during Xen boot from the one afterwards. > > Note that this is probably the only place where platform with heterogeneous > processors are properly supported on Xen. In the future, we should do that > for all "features" and park CPU brought after boot if they does not support > the ones used by Xen. Maybe by having a framework very similar to Linux (see > arch/arm64/kernel/cpufeature.c). > > Cheers, > > -- > Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.