[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 02/18] xen/arm: Implement PSCI system suspend call (virtual interface)

On 11/12/18 4:35 PM, Mirela Simonovic wrote:
Hi Julien,

Thanks for your feedback, I'll need to answer in iterations.

On Mon, Nov 12, 2018 at 4:27 PM Julien Grall <julien.grall@xxxxxxx> wrote:

Hi Mirela,

On 11/12/18 11:30 AM, Mirela Simonovic wrote:
The implementation consists of:
-Adding PSCI system suspend call as new PSCI function
-Trapping PSCI system_suspend HVC
-Implementing PSCI system suspend call (virtual interface that allows
   guests to suspend themselves)

The PSCI system suspend should be called by a guest from its boot
VCPU. Non-boot VCPUs of the guest should be hot-unplugged using PSCI
CPU_OFF call prior to issuing PSCI system suspend. Interrupts that
are left enabled by the guest are assumed to be its wake-up interrupts.
Therefore, a wake-up interrupt triggers the resume of the guest. Guest
should resume regardless of the state of Xen (suspended or not).

When a guest calls PSCI system suspend the respective domain will be
suspended if the following conditions are met:
1) Given resume entry point is not invalid
2) Other (if any) VCPUs of the calling guest are hot-unplugged

If the conditions above are met the calling domain is labeled as
suspended and the calling VCPU is blocked. If nothing else wouldn't
be done the suspended domain would resume from the place where it
called PSCI system suspend. This is expected if processing of the PSCI
system suspend call fails. However, in the case of success the calling
guest should resume (continue execution after the wake-up) from the entry
point which is given as the first argument of the PSCI system suspend
call. In addition to the entry point, the guest expects to start within
the environment whose state matches the state after reset. This means
that the guest should find reset register values, MMU disabled, etc.
Thereby, the context of VCPU should be 'reset' (as if the system is
comming out of reset), the program counter should contain entry point,
which is 1st argument, and r0/x0 should contain context ID which is 2nd
argument of PSCI system suspend call. The context of VCPU is set
accordingly when the PSCI system suspend is processed, so that nothing
needs to be done on resume/wake-up path. However, in order to ensure that
this context doesn't get overwritten by the scheduler when scheduling out
this VCPU (would normally happen after the calling CPU is blocked), we need
to check whether to return early from ctxt_switch_from().

There are variables in domain structure to keep track of domain shutdown.
One of existing shutdown reason is 'suspend' which this patch is using to
track the suspend state of a domain. Those variables are used to determine
whether to early return from ctxt_switch_from() or not.

A suspended domain will resume after the Xen receives an interrupt which is
targeted to the domain, unblocks the domain's VCPU, and schedules it in.
When the VCPU is scheduled in, the VCPU context is already reset, and
contains the right resume entry point in program counter that will be
restored in ctxt_switch_to(). The only thing that needs to be done at this
point is to clear the variables that marked the domain state as suspended.

Signed-off-by: Mirela Simonovic <mirela.simonovic@xxxxxxxxxx>
Signed-off-by: Saeed Nowshadi <saeed.nowshadi@xxxxxxxxxx>

Changes in v2:

-Fix print to compile for arm32 and to align with Xen coding style
   xen/arch/arm/Makefile            |   1 +
   xen/arch/arm/domain.c            |  13 +++
   xen/arch/arm/suspend.c           | 166 
   xen/arch/arm/vpsci.c             |  19 +++++
   xen/include/asm-arm/perfc_defn.h |   1 +
   xen/include/asm-arm/psci.h       |   2 +
   xen/include/asm-arm/suspend.h    |  16 ++++
   xen/include/xen/sched.h          |   1 +
   8 files changed, 219 insertions(+)
   create mode 100644 xen/arch/arm/suspend.c
   create mode 100644 xen/include/asm-arm/suspend.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 23c5d9adbc..744b1a4dc8 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -43,6 +43,7 @@ obj-y += setup.o
   obj-y += shutdown.o
   obj-y += smp.o
   obj-y += smpboot.o
+obj-y += suspend.o
   obj-y += sysctl.o
   obj-y += time.o
   obj-y += traps.o
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index e594b48d81..7f8105465c 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -97,6 +97,11 @@ static void ctxt_switch_from(struct vcpu *p)
       if ( is_idle_vcpu(p) )

+    /* VCPU's context should not be saved if its domain is suspended */
+    if ( p->domain->is_shut_down &&
+        (p->domain->shutdown_code == SHUTDOWN_suspend) )
+        return;
SHUTDOWN_suspend is used in Xen for other purpose (see
SCHEDOP_shutdown). The other user of that code relies on all the state
to be saved on suspend.

We just need a flag to mark a domain as suspended, and I do believe
SHUTDOWN_suspend is not used anywhere else.
Let's come back on this.

See Andrew's comment here.

However, what is the issue with saving all the registers here?

We need to save arguments that are provided by a guest with system
suspend PSCI call. These arguments are the entry point that needs to
be saved in program counter and context ID that needs to be saved in
x0/r0. We don't have these arguments here. Context switch happens
after processing the system suspend PSCI call, so it's too late.

It does not feel right to modify ctxt_switch{from,to} for suspend/resume. If you want to reset the vCPU state before blocking the vCPU, then you should instead

Another way would be to reset the vCPU once you receive the interrupt.


       /* CP 15 */
@@ -181,6 +186,14 @@ static void ctxt_switch_to(struct vcpu *n)
       if ( is_idle_vcpu(n) )

+    /* If the domain was suspended, it is resuming now */
+    if ( n->domain->is_shut_down &&
+        (n->domain->shutdown_code == SHUTDOWN_suspend) )
+    {
+        n->domain->is_shut_down = 0;
+        n->domain->shutdown_code = SHUTDOWN_CODE_INVALID;
+    }

This looks like a hack. Why not calling domain_resume when receiving the

Good point, I need to double check and come back on this.

It looks like domain_resume may not be the right solution (see Andrew's comment). Another solution would be to use domain_pause/domain_unpause.


       vpidr = READ_SYSREG32(MIDR_EL1);
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
new file mode 100644
index 0000000000..9eea9214e1
--- /dev/null
+++ b/xen/arch/arm/suspend.c

I would prefer if we don't mix guest and host suspend in the same file.

Sure, we can move guest suspend code into an another file, e.g.

Do we really need a separate file? The code does not look too big.

@@ -0,0 +1,166 @@

Missing copyright headers here.


+#include <xen/sched.h>
+#include <asm/cpufeature.h>
+#include <asm/event.h>
+#include <asm/psci.h>
+/* Reset values of VCPU architecture specific registers */

Technically this is not requires as most of the registers are unknown. I
understand this helps for debugging an OS.

I would introduce it in a separate patch and directly in
arch_set_info_guest as I would like the behavior to be the same
everywhere we need to reset a vCPU.

I agree. Please just consider that resetting a vCPU context is done in
2 scenarios: one where a vCPU is just created, and another one when
the vCPU already exists but the context has to be cleared. Could you
please provide some guidance on how to do this, because we struggled
for a while and didn't really find a nice way?

Why do you think the 2 scenarios requires different path? In both case the CPU is off and should be reset. It is just the current PSCI CPU on took some shortcut that likely needs to be fixed.

+static void vcpu_arch_reset(struct vcpu *v)
+    v->arch.ttbr0 = 0;
+    v->arch.ttbr1 = 0;
+    v->arch.ttbcr = 0;
+    v->arch.csselr = 0;
+    v->arch.cpacr = 0;
+    v->arch.contextidr = 0;
+    v->arch.tpidr_el0 = 0;
+    v->arch.tpidrro_el0 = 0;
+    v->arch.tpidr_el1 = 0;
+    v->arch.vbar = 0;
+    if ( is_32bit_domain(v->domain) )

This is not necessary

+        v->arch.dacr = 0;
+    v->arch.par = 0;
+#if defined(CONFIG_ARM_32)
+    v->arch.mair0 = 0;
+    v->arch.mair1 = 0;
+    v->arch.amair0 = 0;
+    v->arch.amair1 = 0;
+    v->arch.mair = 0;
+    v->arch.amair = 0;
+    /* Fault Status */
+#if defined(CONFIG_ARM_32)
+    v->arch.dfar = 0;
+    v->arch.ifar = 0;
+    v->arch.dfsr = 0;
+#elif defined(CONFIG_ARM_64)
+    v->arch.far = 0;
+    v->arch.esr = 0;
+    if ( is_32bit_domain(v->domain) )

Same here.

+        v->arch.ifsr  = 0;
+    v->arch.afsr0 = 0;
+    v->arch.afsr1 = 0;
+#ifdef CONFIG_ARM_32
+    v->arch.joscr = 0;
+    v->arch.jmcr = 0;
+    if ( is_32bit_domain(v->domain) && cpu_has_thumbee )

Same here.

+    {
+        v->arch.teecr = 0;
+        v->arch.teehbr = 0;
+    }
+ * This function sets the context of current VCPU to the state which is 
+ * by the guest on resume. The expected VCPU state is:
+ * 1) pc to contain resume entry point (1st argument of PSCI SYSTEM_SUSPEND)
+ * 2) r0/x0 to contain context ID (2nd argument of PSCI SYSTEM_SUSPEND)
+ * 3) All other general purpose and system registers should have reset values
+ *
+ * Note: this function has to return void because it has to always succeed. In
+ * other words, this function is called from virtual PSCI SYSTEM_SUSPEND
+ * implementation, which can return only a limited number of possible errors,
+ * none of which could represent the fact that an error occurred when preparing
+ * the domain for suspend.
+ * Consequently, dynamic memory allocation cannot be done within this function,
+ * because if malloc fails the error has nowhere to propagate.

You could crash the domain if you are not able to resume it. In the
current context...

+ */
+static void vcpu_suspend(register_t epoint, register_t cid)
+    /* Static allocation because dynamic would need a non-void return */
+    static struct vcpu_guest_context ctxt;

... this is not right. This function can be called concurrently, so a
lot of funny things can happen (i.e corruption).

The vCPU context does not look too big. So I would just allocate it on
the stack directly.

Agreed, 'static' should be removed to address all these issues.

+    struct vcpu *v = current;
+    /* Make sure that VCPU guest regs are zeroied */



+    memset(&ctxt, 0, sizeof(ctxt));
+    /* Set non-zero values to the registers prior to copying */
+    ctxt.user_regs.pc64 = (u64)epoint;
+    if ( is_32bit_domain(current->domain) )
+    {
+        ctxt.user_regs.r0_usr = cid;
+        ctxt.user_regs.cpsr = PSR_GUEST32_INIT;
+        /* Thumb set is allowed only for 32-bit domain */
+        if ( epoint & 1 )
+        {
+            ctxt.user_regs.cpsr |= PSR_THUMB;
+            ctxt.user_regs.pc64 &= ~(u64)1;
+        }
+    }
+#ifdef CONFIG_ARM_64
+    else
+    {
+        ctxt.user_regs.x0 = cid;
+        ctxt.user_regs.cpsr = PSR_GUEST64_INIT;
+    }
+    ctxt.sctlr = SCTLR_GUEST_INIT;
+    ctxt.flags = VGCF_online;
+    /* Reset architecture specific registers */
+    vcpu_arch_reset(v); > +
+    /* Initialize VCPU registers */
+    _arch_set_info_guest(v, &ctxt);

AFAICT, this is expected to be called with the domain lock taken as this
can be called by various path.

Also, most of the function is the same as CPU_on. So I would like to see
the code factored in the separate function and used in both place.

I agree, but the 2 scenarios (VCPU allocation versus clearing VCPU
context) made it a bit difficult to share. Please let me know if you
have some additional hint on how to exactly structure the code.

See above. I would be interested to know why you think they are different.

+int32_t domain_suspend(register_t epoint, register_t cid)
+    struct vcpu *v;
+    struct domain *d = current->domain;
+    bool is_thumb = epoint & 1;
+    dprintk(XENLOG_DEBUG,
+            "Dom%d suspend: epoint=0x%"PRIregister", cid=0x%"PRIregister"\n",
+            d->domain_id, epoint, cid);
+    /* THUMB set is not allowed with 64-bit domain */
+    if ( is_64bit_domain(d) && is_thumb )
+        return PSCI_INVALID_ADDRESS;
+    /* Ensure that all CPUs other than the calling one are offline */
+    for_each_vcpu ( d, v )
+    {
+        if ( v != current && is_vcpu_online(v) )
+            return PSCI_DENIED;
+    }

What does prevent a vCPU to not come online while doing the loop?

As you suggested probably nothing if there is a bug in the guest,
which we want to check for. Is the domain_lock right thing to use

The domain_lock is probably the best solution here. You want CPU_ON and SYSTEM_OFF to race. So CPU_ON may need some modification as well.

+    /*
+     * Prepare the calling VCPU for suspend (reset its context, save entry 
+     * into pc and context ID into r0/x0 as specified by PSCI SYSTEM_SUSPEND)
+     */
+    vcpu_suspend(epoint, cid);
+    /*
+     * Set the domain state to suspended (will be cleared when the domain
+     * resumes, i.e. VCPU of this domain gets scheduled in).
+     */
+    d->is_shut_down = 1;
+    d->shutdown_code = SHUTDOWN_suspend;

If you look at the other usage, you will notice that they are protected
with a lock. Why is it not necessary here?

I think it is necessary here too

I am also not entirely sure why we could not re-use code that already
exist in common code. Surely suspend/resume should work in a similar way?

Could you please be more specific (which common code)

You can ignore this for now as shutdown_code may not be the right solution here.


Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.