|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 4/5] x86/cpuid: Simplify recalculate_xstate()
On 03.05.2021 17:39, Andrew Cooper wrote:
> Make use of the new xstate_uncompressed_size() helper rather than maintaining
> the running calculation while accumulating feature components.
>
> The rest of the CPUID data can come direct from the raw cpuid policy. All
> per-component data forms an ABI through the behaviour of the X{SAVE,RSTOR}*
> instructions, and are constant.
>
> Use for_each_set_bit() rather than opencoding a slightly awkward version of
> it. Mask the attributes in ecx down based on the visible features. This
> isn't actually necessary for any components or attributes defined at the time
> of writing (up to AMX), but is added out of an abundance of caution.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> CC: Wei Liu <wl@xxxxxxx>
>
> Using min() in for_each_set_bit() leads to awful code generation, as it
> prohibits the optimiations for spotting that the bitmap is <= BITS_PER_LONG.
> As p->xstate is long enough already, use a BUILD_BUG_ON() instead.
> ---
> xen/arch/x86/cpuid.c | 52
> +++++++++++++++++-----------------------------------
> 1 file changed, 17 insertions(+), 35 deletions(-)
>
> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> index 752bf244ea..c7f8388e5d 100644
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -154,8 +154,7 @@ static void sanitise_featureset(uint32_t *fs)
> static void recalculate_xstate(struct cpuid_policy *p)
> {
> uint64_t xstates = XSTATE_FP_SSE;
> - uint32_t xstate_size = XSTATE_AREA_MIN_SIZE;
> - unsigned int i, Da1 = p->xstate.Da1;
> + unsigned int i, ecx_bits = 0, Da1 = p->xstate.Da1;
>
> /*
> * The Da1 leaf is the only piece of information preserved in the common
> @@ -167,61 +166,44 @@ static void recalculate_xstate(struct cpuid_policy *p)
> return;
>
> if ( p->basic.avx )
> - {
> xstates |= X86_XCR0_YMM;
> - xstate_size = max(xstate_size,
> - xstate_offsets[X86_XCR0_YMM_POS] +
> - xstate_sizes[X86_XCR0_YMM_POS]);
> - }
>
> if ( p->feat.mpx )
> - {
> xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
> - xstate_size = max(xstate_size,
> - xstate_offsets[X86_XCR0_BNDCSR_POS] +
> - xstate_sizes[X86_XCR0_BNDCSR_POS]);
> - }
>
> if ( p->feat.avx512f )
> - {
> xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
> - xstate_size = max(xstate_size,
> - xstate_offsets[X86_XCR0_HI_ZMM_POS] +
> - xstate_sizes[X86_XCR0_HI_ZMM_POS]);
> - }
>
> if ( p->feat.pku )
> - {
> xstates |= X86_XCR0_PKRU;
> - xstate_size = max(xstate_size,
> - xstate_offsets[X86_XCR0_PKRU_POS] +
> - xstate_sizes[X86_XCR0_PKRU_POS]);
> - }
>
> - p->xstate.max_size = xstate_size;
> + /* Subleaf 0 */
> + p->xstate.max_size =
> + xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
> p->xstate.xcr0_low = xstates & ~XSTATE_XSAVES_ONLY;
> p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
>
> + /* Subleaf 1 */
> p->xstate.Da1 = Da1;
> if ( p->xstate.xsaves )
> {
> + ecx_bits |= 3; /* Align64, XSS */
Align64 is also needed for p->xstate.xsavec afaict. I'm not really
convinced to tie one to the other either. I would rather think this
is a per-state-component attribute independent of other features.
Those state components could in turn have a dependency (like XSS
ones on XSAVES).
I'm also not happy at all to see you use a literal 3 here. We have
a struct for this, after all.
> p->xstate.xss_low = xstates & XSTATE_XSAVES_ONLY;
> p->xstate.xss_high = (xstates & XSTATE_XSAVES_ONLY) >> 32;
> }
> - else
> - xstates &= ~XSTATE_XSAVES_ONLY;
>
> - for ( i = 2; i < min(63ul, ARRAY_SIZE(p->xstate.comp)); ++i )
> + /* Subleafs 2+ */
> + xstates &= ~XSTATE_FP_SSE;
> + BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
> + for_each_set_bit ( i, &xstates, 63 )
> {
> - uint64_t curr_xstate = 1ul << i;
> -
> - if ( !(xstates & curr_xstate) )
> - continue;
> -
> - p->xstate.comp[i].size = xstate_sizes[i];
> - p->xstate.comp[i].offset = xstate_offsets[i];
> - p->xstate.comp[i].xss = curr_xstate & XSTATE_XSAVES_ONLY;
> - p->xstate.comp[i].align = curr_xstate & xstate_align;
> + /*
> + * Pass through size (eax) and offset (ebx) directly. Visbility of
> + * attributes in ecx limited by visible features in Da1.
> + */
> + p->xstate.raw[i].a = raw_cpuid_policy.xstate.raw[i].a;
> + p->xstate.raw[i].b = raw_cpuid_policy.xstate.raw[i].b;
> + p->xstate.raw[i].c = raw_cpuid_policy.xstate.raw[i].c & ecx_bits;
To me, going to raw[].{a,b,c,d} looks like a backwards move, to be
honest. Both this and the literal 3 above make it harder to locate
all the places that need changing if a new bit (like xfd) is to be
added. It would be better if grep-ing for an existing field name
(say "xss") would easily turn up all involved places.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |