[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xen/arm: Suspend to RAM Support in Xen for ARM



Hi Mirela,

Thank you for the sending the design document. The general design looks good to me. I have some comments below, but they are more related to the implementation of CPU on/off in Xen.

On 22/12/17 17:41, Mirela Simonovic wrote:

[...]

+---------------
+Resuming Guests
+---------------
+
+Resume of the privileged guest (Dom0) is always following the Xen resume.
+
+An unprivileged guest shall resume once a device it owns triggers a wake-up
+interrupt, regardless of whether Xen was suspended when the wake-up interrupt
+was triggered. If Xen was suspended, it is assumed that Dom0 will be running
+before the DomU guest starts to resume. The synchronization mechanism to
+enforce the assumed condition is TBD.

Given that all but the non-boot CPU will be offlined. Does the wake-up interrupt always need to target the non-boot CPU?

+
+If the ARM's GIC was powered down after the ARM subsystem suspended, it is
+assumed that Xen needs to restore the GIC interface for a VM prior to handing
+over control to the guest. However, the guest should restore its own context
+upon entering the resume point, just like it would when running without Xen.
+
+===============
+Implementation
+===============

[...]

+CPU_OFF (physical CPUs)
+-----------------------
+The CPU_OFF function shall be implemented in
+* call_psci_cpu_off() in arch/arm/psci.c
+
+The implementation shall consist just of making the SMC call to EL3.
+
+This function needs to be called when Xen generic code disables a non-boot CPU.
+When a CPU is disabled it will loop forever in while loop (stop_cpu() function
+which is already implemented in xen/arch/arm/smpboot.c). Call to
+call_psci_cpu_off() shall be made before the CPU enters infinite loop.

While the code is present, we never offline physical CPU at the moment except when shutting down the place. So I am not fully convinced that stop_cpu() is properly implemented.

For instance, you likely need to migrate interrupts that was assigned to the physical CPU (either guest one or Xen one). Though Xen ones might be less a concern because I think they are always assigned to CPU0 at the moment.

Furthermore, PPI handlers are not removed. Same for any memory allocated (you may loose reference to it because percpu area for that CPU will get freed). I believe get into trouble when the CPU is back online?

I may have miss other bits, so I would highly recommend to go through the boot code and see what could go wrong.

[..]

+Resume Flow
+------------
+The resume entry point shall be implemented in
+* hyp_resume() in arch/arm/arm64/entry.S
+The very beginning of the resume procedure has to be implemented in assembly.
+It shall contain the following:
+* Enable the MMU so that the structure containing CPU context which was saved 
on
+suspend can be accessed
+* Restore CPU context (to match the values saved on suspend) and return into C
+* Set the system_state variable to SYS_STATE_resume
+* Restore GIC context
+* Resume timer
+* Enable interrupts
+* Enable non-boot CPUs by calling enable_nonboot_cpus()

You would have to be careful on re-enabling the non-CPU. start_secondary is implemented based on the assumption that it will only be called during Xen boot. Some of the code may be part of __init (see cpu_up_send_sgi) or should not be called as it is after boot (e.g check_local_cpu_errata).

Another I have in mind is the way VTCR_EL2 is set today (see setup_virt_paging). It is done at boot time, so if you online a CPU afterwards, VTCR_EL2 will not be set correctly.

I probably have missed other bits. I am happy to provide more insights here.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.