[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 05/12] arm/sve: save/restore SVE context switch
> On 18 May 2023, at 19:27, Julien Grall <julien@xxxxxxx> wrote: > > Hi Luca, > > On 24/04/2023 07:02, Luca Fancellu wrote: >> Save/restore context switch for SVE, allocate memory to contain >> the Z0-31 registers whose length is maximum 2048 bits each and >> FFR who can be maximum 256 bits, the allocated memory depends on >> how many bits is the vector length for the domain and how many bits >> are supported by the platform. >> Save P0-15 whose length is maximum 256 bits each, in this case the >> memory used is from the fpregs field in struct vfp_state, >> because V0-31 are part of Z0-31 and this space would have been >> unused for SVE domain otherwise. >> Create zcr_el{1,2} fields in arch_vcpu, initialise zcr_el2 on vcpu >> creation given the requested vector length and restore it on >> context switch, save/restore ZCR_EL1 value as well. >> Signed-off-by: Luca Fancellu <luca.fancellu@xxxxxxx> >> --- >> Changes from v5: >> - use XFREE instead of xfree, keep the headers (Julien) >> - Avoid math computation for every save/restore, store the computation >> in struct vfp_state once (Bertrand) >> - protect access to v->domain->arch.sve_vl inside arch_vcpu_create now >> that sve_vl is available only on arm64 >> Changes from v4: >> - No changes >> Changes from v3: >> - don't use fixed len types when not needed (Jan) >> - now VL is an encoded value, decode it before using. >> Changes from v2: >> - No changes >> Changes from v1: >> - No changes >> Changes from RFC: >> - Moved zcr_el2 field introduction in this patch, restore its >> content inside sve_restore_state function. (Julien) >> --- >> xen/arch/arm/arm64/sve-asm.S | 141 +++++++++++++++++++++++ >> xen/arch/arm/arm64/sve.c | 63 ++++++++++ >> xen/arch/arm/arm64/vfp.c | 79 +++++++------ >> xen/arch/arm/domain.c | 9 ++ >> xen/arch/arm/include/asm/arm64/sve.h | 13 +++ >> xen/arch/arm/include/asm/arm64/sysregs.h | 3 + >> xen/arch/arm/include/asm/arm64/vfp.h | 12 ++ >> xen/arch/arm/include/asm/domain.h | 2 + >> 8 files changed, 288 insertions(+), 34 deletions(-) >> diff --git a/xen/arch/arm/arm64/sve-asm.S b/xen/arch/arm/arm64/sve-asm.S >> index 4d1549344733..8c37d7bc95d5 100644 >> --- a/xen/arch/arm/arm64/sve-asm.S >> +++ b/xen/arch/arm/arm64/sve-asm.S > > Are all the new helpers added in this patch taken from Linux? If so, it would > be good to clarify this (again) in the commit message as it helps for the > review (I can diff with Linux rather than properly reviewing them). > >> diff --git a/xen/arch/arm/arm64/sve.c b/xen/arch/arm/arm64/sve.c >> index 86a5e617bfca..064832b450ff 100644 >> --- a/xen/arch/arm/arm64/sve.c >> +++ b/xen/arch/arm/arm64/sve.c >> @@ -5,6 +5,8 @@ >> * Copyright (C) 2022 ARM Ltd. >> */ >> +#include <xen/sched.h> >> +#include <xen/sizes.h> >> #include <xen/types.h> >> #include <asm/arm64/sve.h> >> #include <asm/arm64/sysregs.h> >> @@ -13,6 +15,24 @@ >> #include <asm/system.h> >> extern unsigned int sve_get_hw_vl(void); >> +extern void sve_save_ctx(uint64_t *sve_ctx, uint64_t *pregs, int save_ffr); >> +extern void sve_load_ctx(uint64_t const *sve_ctx, uint64_t const *pregs, >> + int restore_ffr); > > From the use, it is not entirely what restore_ffr/save_ffr is meant to be. > Are they bool? If so, maybe use bool? At mimimum, they probably want to be > unsigned int. I have to say that I trusted the Linux implementation here, in arch/rm64/include/asm/fpsimd.h, that uses int: extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr); extern void sve_load_state(void const *state, u32 const *pfpsr, int restore_ffr); But if you prefer I can put unsigned int instead. > >> + >> +static inline unsigned int sve_zreg_ctx_size(unsigned int vl) >> +{ >> + /* >> + * Z0-31 registers size in bytes is computed from VL that is in bits, >> so VL >> + * in bytes is VL/8. >> + */ >> + return (vl / 8U) * 32U; >> +} >> + >> +static inline unsigned int sve_ffrreg_ctx_size(unsigned int vl) >> +{ >> + /* FFR register size is VL/8, which is in bytes (VL/8)/8 */ >> + return (vl / 64U); >> +} >> register_t compute_max_zcr(void) >> { >> @@ -60,3 +80,46 @@ unsigned int get_sys_vl_len(void) >> return ((system_cpuinfo.zcr64.bits[0] & ZCR_ELx_LEN_MASK) + 1U) * >> SVE_VL_MULTIPLE_VAL; >> } >> + >> +int sve_context_init(struct vcpu *v) >> +{ >> + unsigned int sve_vl_bits = sve_decode_vl(v->domain->arch.sve_vl); >> + uint64_t *ctx = _xzalloc(sve_zreg_ctx_size(sve_vl_bits) + >> + sve_ffrreg_ctx_size(sve_vl_bits), >> + L1_CACHE_BYTES); >> + >> + if ( !ctx ) >> + return -ENOMEM; >> + >> + /* Point to the end of Z0-Z31 memory, just before FFR memory */ > > NIT: I would add that the logic should be kept in sync with > sve_context_free(). Same... > >> + v->arch.vfp.sve_zreg_ctx_end = ctx + >> + (sve_zreg_ctx_size(sve_vl_bits) / sizeof(uint64_t)); >> + >> + return 0; >> +} >> + >> +void sve_context_free(struct vcpu *v) >> +{ >> + unsigned int sve_vl_bits = sve_decode_vl(v->domain->arch.sve_vl); >> + >> + /* Point back to the beginning of Z0-Z31 + FFR memory */ > > ... here (but with sve_context_init()). So it is clearer that if the logic > change in one place then it needs to be changed in the other. Sure I will > >> + v->arch.vfp.sve_zreg_ctx_end -= >> + (sve_zreg_ctx_size(sve_vl_bits) / sizeof(uint64_t)); > > From my understanding, sve_context_free() could be called with > sve_zreg_ctxt_end equal to NULL (i.e. because sve_context_init() failed). So > wouldn't we end up to substract the value to NULL and therefore... > >> + >> + XFREE(v->arch.vfp.sve_zreg_ctx_end); > > ... free a random pointer? Thank you for spotting this, I will surround the operations in sve_context_free by: if ( v->arch.vfp.sve_zreg_ctx_end ) I’m assuming the memory should be zero initialised for the vfp structure, please correct me if I’m wrong. > >> +} >> + >> +void sve_save_state(struct vcpu *v) >> +{ >> + v->arch.zcr_el1 = READ_SYSREG(ZCR_EL1); >> + >> + sve_save_ctx(v->arch.vfp.sve_zreg_ctx_end, v->arch.vfp.fpregs, 1); >> +} >> + >> +void sve_restore_state(struct vcpu *v) >> +{ >> + WRITE_SYSREG(v->arch.zcr_el1, ZCR_EL1); >> + WRITE_SYSREG(v->arch.zcr_el2, ZCR_EL2); > > AFAIU, this value will be used for the restore below. So don't we need an > isb()? We reached the agreement on this in patch 1 > >> + >> + sve_load_ctx(v->arch.vfp.sve_zreg_ctx_end, v->arch.vfp.fpregs, 1); >> +} >> diff --git a/xen/arch/arm/arm64/vfp.c b/xen/arch/arm/arm64/vfp.c >> index 47885e76baae..2d0d7c2e6ddb 100644 >> --- a/xen/arch/arm/arm64/vfp.c >> +++ b/xen/arch/arm/arm64/vfp.c >> @@ -2,29 +2,35 @@ >> #include <asm/processor.h> >> #include <asm/cpufeature.h> >> #include <asm/vfp.h> >> +#include <asm/arm64/sve.h> >> void vfp_save_state(struct vcpu *v) >> { >> if ( !cpu_has_fp ) >> return; >> - asm volatile("stp q0, q1, [%1, #16 * 0]\n\t" >> - "stp q2, q3, [%1, #16 * 2]\n\t" >> - "stp q4, q5, [%1, #16 * 4]\n\t" >> - "stp q6, q7, [%1, #16 * 6]\n\t" >> - "stp q8, q9, [%1, #16 * 8]\n\t" >> - "stp q10, q11, [%1, #16 * 10]\n\t" >> - "stp q12, q13, [%1, #16 * 12]\n\t" >> - "stp q14, q15, [%1, #16 * 14]\n\t" >> - "stp q16, q17, [%1, #16 * 16]\n\t" >> - "stp q18, q19, [%1, #16 * 18]\n\t" >> - "stp q20, q21, [%1, #16 * 20]\n\t" >> - "stp q22, q23, [%1, #16 * 22]\n\t" >> - "stp q24, q25, [%1, #16 * 24]\n\t" >> - "stp q26, q27, [%1, #16 * 26]\n\t" >> - "stp q28, q29, [%1, #16 * 28]\n\t" >> - "stp q30, q31, [%1, #16 * 30]\n\t" >> - : "=Q" (*v->arch.vfp.fpregs) : "r" (v->arch.vfp.fpregs)); >> + if ( is_sve_domain(v->domain) ) >> + sve_save_state(v); >> + else >> + { >> + asm volatile("stp q0, q1, [%1, #16 * 0]\n\t" >> + "stp q2, q3, [%1, #16 * 2]\n\t" >> + "stp q4, q5, [%1, #16 * 4]\n\t" >> + "stp q6, q7, [%1, #16 * 6]\n\t" >> + "stp q8, q9, [%1, #16 * 8]\n\t" >> + "stp q10, q11, [%1, #16 * 10]\n\t" >> + "stp q12, q13, [%1, #16 * 12]\n\t" >> + "stp q14, q15, [%1, #16 * 14]\n\t" >> + "stp q16, q17, [%1, #16 * 16]\n\t" >> + "stp q18, q19, [%1, #16 * 18]\n\t" >> + "stp q20, q21, [%1, #16 * 20]\n\t" >> + "stp q22, q23, [%1, #16 * 22]\n\t" >> + "stp q24, q25, [%1, #16 * 24]\n\t" >> + "stp q26, q27, [%1, #16 * 26]\n\t" >> + "stp q28, q29, [%1, #16 * 28]\n\t" >> + "stp q30, q31, [%1, #16 * 30]\n\t" >> + : "=Q" (*v->arch.vfp.fpregs) : "r" >> (v->arch.vfp.fpregs)); >> + } >> v->arch.vfp.fpsr = READ_SYSREG(FPSR); >> v->arch.vfp.fpcr = READ_SYSREG(FPCR); >> @@ -37,23 +43,28 @@ void vfp_restore_state(struct vcpu *v) >> if ( !cpu_has_fp ) >> return; >> - asm volatile("ldp q0, q1, [%1, #16 * 0]\n\t" >> - "ldp q2, q3, [%1, #16 * 2]\n\t" >> - "ldp q4, q5, [%1, #16 * 4]\n\t" >> - "ldp q6, q7, [%1, #16 * 6]\n\t" >> - "ldp q8, q9, [%1, #16 * 8]\n\t" >> - "ldp q10, q11, [%1, #16 * 10]\n\t" >> - "ldp q12, q13, [%1, #16 * 12]\n\t" >> - "ldp q14, q15, [%1, #16 * 14]\n\t" >> - "ldp q16, q17, [%1, #16 * 16]\n\t" >> - "ldp q18, q19, [%1, #16 * 18]\n\t" >> - "ldp q20, q21, [%1, #16 * 20]\n\t" >> - "ldp q22, q23, [%1, #16 * 22]\n\t" >> - "ldp q24, q25, [%1, #16 * 24]\n\t" >> - "ldp q26, q27, [%1, #16 * 26]\n\t" >> - "ldp q28, q29, [%1, #16 * 28]\n\t" >> - "ldp q30, q31, [%1, #16 * 30]\n\t" >> - : : "Q" (*v->arch.vfp.fpregs), "r" (v->arch.vfp.fpregs)); >> + if ( is_sve_domain(v->domain) ) >> + sve_restore_state(v); >> + else >> + { >> + asm volatile("ldp q0, q1, [%1, #16 * 0]\n\t" >> + "ldp q2, q3, [%1, #16 * 2]\n\t" >> + "ldp q4, q5, [%1, #16 * 4]\n\t" >> + "ldp q6, q7, [%1, #16 * 6]\n\t" >> + "ldp q8, q9, [%1, #16 * 8]\n\t" >> + "ldp q10, q11, [%1, #16 * 10]\n\t" >> + "ldp q12, q13, [%1, #16 * 12]\n\t" >> + "ldp q14, q15, [%1, #16 * 14]\n\t" >> + "ldp q16, q17, [%1, #16 * 16]\n\t" >> + "ldp q18, q19, [%1, #16 * 18]\n\t" >> + "ldp q20, q21, [%1, #16 * 20]\n\t" >> + "ldp q22, q23, [%1, #16 * 22]\n\t" >> + "ldp q24, q25, [%1, #16 * 24]\n\t" >> + "ldp q26, q27, [%1, #16 * 26]\n\t" >> + "ldp q28, q29, [%1, #16 * 28]\n\t" >> + "ldp q30, q31, [%1, #16 * 30]\n\t" >> + : : "Q" (*v->arch.vfp.fpregs), "r" >> (v->arch.vfp.fpregs)); >> + } >> WRITE_SYSREG(v->arch.vfp.fpsr, FPSR); >> WRITE_SYSREG(v->arch.vfp.fpcr, FPCR); >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >> index 143359d0f313..24c722a4a11e 100644 >> --- a/xen/arch/arm/domain.c >> +++ b/xen/arch/arm/domain.c >> @@ -552,7 +552,14 @@ int arch_vcpu_create(struct vcpu *v) >> v->arch.cptr_el2 = get_default_cptr_flags(); >> if ( is_sve_domain(v->domain) ) >> + { >> + if ( (rc = sve_context_init(v)) != 0 ) >> + goto fail; >> v->arch.cptr_el2 &= ~HCPTR_CP(8); >> +#ifdef CONFIG_ARM64_SVE > > This #ifdef reads a bit odd to me because you are protecting v->arch.zcr_el2 > but not the rest. This is one of the case where I would surround the full if > with the #ifdef because it makes clearer that there is no way the rest of the > code can be reached if !CONFIG_ARM64_SVE. > > That said, I would actually prefer if... > >> + v->arch.zcr_el2 = vl_to_zcr(sve_decode_vl(v->domain->arch.sve_vl)); > > ... this line is moved in sve_context_init() because this is related to the > SVE context. Sure I will do that, so if I’ve understood correctly, you want me to keep this: v->arch.cptr_el2 = get_default_cptr_flags(); if ( is_sve_domain(v->domain) ) { if ( (rc = sve_context_init(v)) != 0 ) goto fail; v->arch.cptr_el2 &= ~HCPTR_CP(8); } Without #ifdef CONFIG_ARM64_SVE > >> +#endif >> + } >> v->arch.hcr_el2 = get_default_hcr_flags(); >> @@ -582,6 +589,8 @@ fail: >> void arch_vcpu_destroy(struct vcpu *v) >> { >> + if ( is_sve_domain(v->domain) ) >> + sve_context_free(v); >> vcpu_timer_destroy(v); >> vcpu_vgic_free(v); >> free_xenheap_pages(v->arch.stack, STACK_ORDER); >> diff --git a/xen/arch/arm/include/asm/arm64/sve.h >> b/xen/arch/arm/include/asm/arm64/sve.h >> index 730c3fb5a9c8..582405dfdf6a 100644 >> --- a/xen/arch/arm/include/asm/arm64/sve.h >> +++ b/xen/arch/arm/include/asm/arm64/sve.h >> @@ -26,6 +26,10 @@ static inline unsigned int sve_decode_vl(unsigned int >> sve_vl) >> register_t compute_max_zcr(void); >> register_t vl_to_zcr(unsigned int vl); >> unsigned int get_sys_vl_len(void); >> +int sve_context_init(struct vcpu *v); >> +void sve_context_free(struct vcpu *v); >> +void sve_save_state(struct vcpu *v); >> +void sve_restore_state(struct vcpu *v); >> #else /* !CONFIG_ARM64_SVE */ >> @@ -46,6 +50,15 @@ static inline unsigned int get_sys_vl_len(void) >> return 0; >> } >> +static inline int sve_context_init(struct vcpu *v) >> +{ >> + return 0; >> +} >> + >> +static inline void sve_context_free(struct vcpu *v) {} >> +static inline void sve_save_state(struct vcpu *v) {} >> +static inline void sve_restore_state(struct vcpu *v) {} >> + >> #endif /* CONFIG_ARM64_SVE */ >> #endif /* _ARM_ARM64_SVE_H */ >> diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h >> b/xen/arch/arm/include/asm/arm64/sysregs.h >> index 4cabb9eb4d5e..3fdeb9d8cdef 100644 >> --- a/xen/arch/arm/include/asm/arm64/sysregs.h >> +++ b/xen/arch/arm/include/asm/arm64/sysregs.h >> @@ -88,6 +88,9 @@ >> #ifndef ID_AA64ISAR2_EL1 >> #define ID_AA64ISAR2_EL1 S3_0_C0_C6_2 >> #endif >> +#ifndef ZCR_EL1 >> +#define ZCR_EL1 S3_0_C1_C2_0 >> +#endif >> /* ID registers (imported from arm64/include/asm/sysreg.h in Linux) */ >> diff --git a/xen/arch/arm/include/asm/arm64/vfp.h >> b/xen/arch/arm/include/asm/arm64/vfp.h >> index e6e8c363bc16..4aa371e85d26 100644 >> --- a/xen/arch/arm/include/asm/arm64/vfp.h >> +++ b/xen/arch/arm/include/asm/arm64/vfp.h >> @@ -6,7 +6,19 @@ >> struct vfp_state >> { >> + /* >> + * When SVE is enabled for the guest, fpregs memory will be used to >> + * save/restore P0-P15 registers, otherwise it will be used for the >> V0-V31 >> + * registers. >> + */ >> uint64_t fpregs[64] __vfp_aligned; >> + /* >> + * When SVE is enabled for the guest, sve_zreg_ctx_end points to memory >> + * where Z0-Z31 registers and FFR can be saved/restored, it points at >> the >> + * end of the Z0-Z31 space and at the beginning of the FFR space, it's >> done >> + * like that to ease the save/restore assembly operations. >> + */ >> + uint64_t *sve_zreg_ctx_end; >> register_t fpcr; >> register_t fpexc32_el2; >> register_t fpsr; >> diff --git a/xen/arch/arm/include/asm/domain.h >> b/xen/arch/arm/include/asm/domain.h >> index 331da0f3bcc3..814652d92568 100644 >> --- a/xen/arch/arm/include/asm/domain.h >> +++ b/xen/arch/arm/include/asm/domain.h >> @@ -195,6 +195,8 @@ struct arch_vcpu >> register_t tpidrro_el0; >> /* HYP configuration */ >> + register_t zcr_el1; >> + register_t zcr_el2; >> register_t cptr_el2; >> register_t hcr_el2; >> register_t mdcr_el2; > > Cheers, > > -- > Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |