[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 05/12] arm/sve: save/restore SVE context switch



Hi,

On 19/05/2023 18:35, Luca Fancellu wrote:


On 18 May 2023, at 19:27, Julien Grall <julien@xxxxxxx> wrote:

Hi Luca,

On 24/04/2023 07:02, Luca Fancellu wrote:
Save/restore context switch for SVE, allocate memory to contain
the Z0-31 registers whose length is maximum 2048 bits each and
FFR who can be maximum 256 bits, the allocated memory depends on
how many bits is the vector length for the domain and how many bits
are supported by the platform.
Save P0-15 whose length is maximum 256 bits each, in this case the
memory used is from the fpregs field in struct vfp_state,
because V0-31 are part of Z0-31 and this space would have been
unused for SVE domain otherwise.
Create zcr_el{1,2} fields in arch_vcpu, initialise zcr_el2 on vcpu
creation given the requested vector length and restore it on
context switch, save/restore ZCR_EL1 value as well.
Signed-off-by: Luca Fancellu <luca.fancellu@xxxxxxx>
---
Changes from v5:
  - use XFREE instead of xfree, keep the headers (Julien)
  - Avoid math computation for every save/restore, store the computation
    in struct vfp_state once (Bertrand)
  - protect access to v->domain->arch.sve_vl inside arch_vcpu_create now
    that sve_vl is available only on arm64
Changes from v4:
  - No changes
Changes from v3:
  - don't use fixed len types when not needed (Jan)
  - now VL is an encoded value, decode it before using.
Changes from v2:
  - No changes
Changes from v1:
  - No changes
Changes from RFC:
  - Moved zcr_el2 field introduction in this patch, restore its
    content inside sve_restore_state function. (Julien)
---
  xen/arch/arm/arm64/sve-asm.S             | 141 +++++++++++++++++++++++
  xen/arch/arm/arm64/sve.c                 |  63 ++++++++++
  xen/arch/arm/arm64/vfp.c                 |  79 +++++++------
  xen/arch/arm/domain.c                    |   9 ++
  xen/arch/arm/include/asm/arm64/sve.h     |  13 +++
  xen/arch/arm/include/asm/arm64/sysregs.h |   3 +
  xen/arch/arm/include/asm/arm64/vfp.h     |  12 ++
  xen/arch/arm/include/asm/domain.h        |   2 +
  8 files changed, 288 insertions(+), 34 deletions(-)
diff --git a/xen/arch/arm/arm64/sve-asm.S b/xen/arch/arm/arm64/sve-asm.S
index 4d1549344733..8c37d7bc95d5 100644
--- a/xen/arch/arm/arm64/sve-asm.S
+++ b/xen/arch/arm/arm64/sve-asm.S

Are all the new helpers added in this patch taken from Linux? If so, it would 
be good to clarify this (again) in the commit message as it helps for the 
review (I can diff with Linux rather than properly reviewing them).

diff --git a/xen/arch/arm/arm64/sve.c b/xen/arch/arm/arm64/sve.c
index 86a5e617bfca..064832b450ff 100644
--- a/xen/arch/arm/arm64/sve.c
+++ b/xen/arch/arm/arm64/sve.c
@@ -5,6 +5,8 @@
   * Copyright (C) 2022 ARM Ltd.
   */
  +#include <xen/sched.h>
+#include <xen/sizes.h>
  #include <xen/types.h>
  #include <asm/arm64/sve.h>
  #include <asm/arm64/sysregs.h>
@@ -13,6 +15,24 @@
  #include <asm/system.h>
    extern unsigned int sve_get_hw_vl(void);
+extern void sve_save_ctx(uint64_t *sve_ctx, uint64_t *pregs, int save_ffr);
+extern void sve_load_ctx(uint64_t const *sve_ctx, uint64_t const *pregs,
+                         int restore_ffr);

 From the use, it is not entirely what restore_ffr/save_ffr is meant to be. Are 
they bool? If so, maybe use bool? At mimimum, they probably want to be unsigned 
int.

I have to say that I trusted the Linux implementation here, in 
arch/rm64/include/asm/fpsimd.h, that uses int:

Ah, so this is a verbatim copy of the Linux code? If so...


extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
extern void sve_load_state(void const *state, u32 const *pfpsr,
int restore_ffr);

But if you prefer I can put unsigned int instead.

... keep it as-is (Linux seems to like using 'int' for bool) but I would suggest to document the expected values.



+
+static inline unsigned int sve_zreg_ctx_size(unsigned int vl)
+{
+    /*
+     * Z0-31 registers size in bytes is computed from VL that is in bits, so VL
+     * in bytes is VL/8.
+     */
+    return (vl / 8U) * 32U;
+}
+
+static inline unsigned int sve_ffrreg_ctx_size(unsigned int vl)
+{
+    /* FFR register size is VL/8, which is in bytes (VL/8)/8 */
+    return (vl / 64U);
+}
    register_t compute_max_zcr(void)
  {
@@ -60,3 +80,46 @@ unsigned int get_sys_vl_len(void)
      return ((system_cpuinfo.zcr64.bits[0] & ZCR_ELx_LEN_MASK) + 1U) *
              SVE_VL_MULTIPLE_VAL;
  }
+
+int sve_context_init(struct vcpu *v)
+{
+    unsigned int sve_vl_bits = sve_decode_vl(v->domain->arch.sve_vl);
+    uint64_t *ctx = _xzalloc(sve_zreg_ctx_size(sve_vl_bits) +
+                             sve_ffrreg_ctx_size(sve_vl_bits),
+                             L1_CACHE_BYTES);
+
+    if ( !ctx )
+        return -ENOMEM;
+
+    /* Point to the end of Z0-Z31 memory, just before FFR memory */

NIT: I would add that the logic should be kept in sync with sve_context_free(). 
Same...

+    v->arch.vfp.sve_zreg_ctx_end = ctx +
+        (sve_zreg_ctx_size(sve_vl_bits) / sizeof(uint64_t));
+
+    return 0;
+}
+
+void sve_context_free(struct vcpu *v)
+{
+    unsigned int sve_vl_bits = sve_decode_vl(v->domain->arch.sve_vl);
+
+    /* Point back to the beginning of Z0-Z31 + FFR memory */

... here (but with sve_context_init()). So it is clearer that if the logic 
change in one place then it needs to be changed in the other.

Sure I will


+    v->arch.vfp.sve_zreg_ctx_end -=
+        (sve_zreg_ctx_size(sve_vl_bits) / sizeof(uint64_t));

 From my understanding, sve_context_free() could be called with 
sve_zreg_ctxt_end equal to NULL (i.e. because sve_context_init() failed). So 
wouldn't we end up to substract the value to NULL and therefore...

+
+    XFREE(v->arch.vfp.sve_zreg_ctx_end);

... free a random pointer?

Thank you for spotting this, I will surround the operations in sve_context_free 
by:

if ( v->arch.vfp.sve_zreg_ctx_end )

Rather than surrounding, how about adding:

if ( !v->arch.vfp...)
  return;

This would avoid an extra indentation.


I’m assuming the memory should be zero initialised for the vfp structure, please
correct me if I’m wrong.

This is part of the struct vcpu. So yes (see alloc_vcpu_struct()).

[...]

index 143359d0f313..24c722a4a11e 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -552,7 +552,14 @@ int arch_vcpu_create(struct vcpu *v)
        v->arch.cptr_el2 = get_default_cptr_flags();
      if ( is_sve_domain(v->domain) )
+    {
+        if ( (rc = sve_context_init(v)) != 0 )
+            goto fail;
          v->arch.cptr_el2 &= ~HCPTR_CP(8);
+#ifdef CONFIG_ARM64_SVE

This #ifdef reads a bit odd to me because you are protecting v->arch.zcr_el2 
but not the rest. This is one of the case where I would surround the full if with 
the #ifdef because it makes clearer that there is no way the rest of the code can 
be reached if !CONFIG_ARM64_SVE.

That said, I would actually prefer if...

+        v->arch.zcr_el2 = vl_to_zcr(sve_decode_vl(v->domain->arch.sve_vl));

... this line is moved in sve_context_init() because this is related to the SVE 
context.

Sure I will do that, so if I’ve understood correctly, you want me to keep this:


v->arch.cptr_el2 = get_default_cptr_flags();
if ( is_sve_domain(v->domain) )
{
     if ( (rc = sve_context_init(v)) != 0 )
         goto fail;
     v->arch.cptr_el2 &= ~HCPTR_CP(8);
}

Without #ifdef CONFIG_ARM64_SVE

Yes please.

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.