[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 4/5] x86/cpuid: Simplify recalculate_xstate()
On 03.05.2021 17:39, Andrew Cooper wrote: > Make use of the new xstate_uncompressed_size() helper rather than maintaining > the running calculation while accumulating feature components. > > The rest of the CPUID data can come direct from the raw cpuid policy. All > per-component data forms an ABI through the behaviour of the X{SAVE,RSTOR}* > instructions, and are constant. > > Use for_each_set_bit() rather than opencoding a slightly awkward version of > it. Mask the attributes in ecx down based on the visible features. This > isn't actually necessary for any components or attributes defined at the time > of writing (up to AMX), but is added out of an abundance of caution. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > --- > CC: Jan Beulich <JBeulich@xxxxxxxx> > CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> > CC: Wei Liu <wl@xxxxxxx> > > Using min() in for_each_set_bit() leads to awful code generation, as it > prohibits the optimiations for spotting that the bitmap is <= BITS_PER_LONG. > As p->xstate is long enough already, use a BUILD_BUG_ON() instead. > --- > xen/arch/x86/cpuid.c | 52 > +++++++++++++++++----------------------------------- > 1 file changed, 17 insertions(+), 35 deletions(-) > > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c > index 752bf244ea..c7f8388e5d 100644 > --- a/xen/arch/x86/cpuid.c > +++ b/xen/arch/x86/cpuid.c > @@ -154,8 +154,7 @@ static void sanitise_featureset(uint32_t *fs) > static void recalculate_xstate(struct cpuid_policy *p) > { > uint64_t xstates = XSTATE_FP_SSE; > - uint32_t xstate_size = XSTATE_AREA_MIN_SIZE; > - unsigned int i, Da1 = p->xstate.Da1; > + unsigned int i, ecx_bits = 0, Da1 = p->xstate.Da1; > > /* > * The Da1 leaf is the only piece of information preserved in the common > @@ -167,61 +166,44 @@ static void recalculate_xstate(struct cpuid_policy *p) > return; > > if ( p->basic.avx ) > - { > xstates |= X86_XCR0_YMM; > - xstate_size = max(xstate_size, > - xstate_offsets[X86_XCR0_YMM_POS] + > - xstate_sizes[X86_XCR0_YMM_POS]); > - } > > if ( p->feat.mpx ) > - { > xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR; > - xstate_size = max(xstate_size, > - xstate_offsets[X86_XCR0_BNDCSR_POS] + > - xstate_sizes[X86_XCR0_BNDCSR_POS]); > - } > > if ( p->feat.avx512f ) > - { > xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM; > - xstate_size = max(xstate_size, > - xstate_offsets[X86_XCR0_HI_ZMM_POS] + > - xstate_sizes[X86_XCR0_HI_ZMM_POS]); > - } > > if ( p->feat.pku ) > - { > xstates |= X86_XCR0_PKRU; > - xstate_size = max(xstate_size, > - xstate_offsets[X86_XCR0_PKRU_POS] + > - xstate_sizes[X86_XCR0_PKRU_POS]); > - } > > - p->xstate.max_size = xstate_size; > + /* Subleaf 0 */ > + p->xstate.max_size = > + xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY); > p->xstate.xcr0_low = xstates & ~XSTATE_XSAVES_ONLY; > p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32; > > + /* Subleaf 1 */ > p->xstate.Da1 = Da1; > if ( p->xstate.xsaves ) > { > + ecx_bits |= 3; /* Align64, XSS */ Align64 is also needed for p->xstate.xsavec afaict. I'm not really convinced to tie one to the other either. I would rather think this is a per-state-component attribute independent of other features. Those state components could in turn have a dependency (like XSS ones on XSAVES). I'm also not happy at all to see you use a literal 3 here. We have a struct for this, after all. > p->xstate.xss_low = xstates & XSTATE_XSAVES_ONLY; > p->xstate.xss_high = (xstates & XSTATE_XSAVES_ONLY) >> 32; > } > - else > - xstates &= ~XSTATE_XSAVES_ONLY; > > - for ( i = 2; i < min(63ul, ARRAY_SIZE(p->xstate.comp)); ++i ) > + /* Subleafs 2+ */ > + xstates &= ~XSTATE_FP_SSE; > + BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63); > + for_each_set_bit ( i, &xstates, 63 ) > { > - uint64_t curr_xstate = 1ul << i; > - > - if ( !(xstates & curr_xstate) ) > - continue; > - > - p->xstate.comp[i].size = xstate_sizes[i]; > - p->xstate.comp[i].offset = xstate_offsets[i]; > - p->xstate.comp[i].xss = curr_xstate & XSTATE_XSAVES_ONLY; > - p->xstate.comp[i].align = curr_xstate & xstate_align; > + /* > + * Pass through size (eax) and offset (ebx) directly. Visbility of > + * attributes in ecx limited by visible features in Da1. > + */ > + p->xstate.raw[i].a = raw_cpuid_policy.xstate.raw[i].a; > + p->xstate.raw[i].b = raw_cpuid_policy.xstate.raw[i].b; > + p->xstate.raw[i].c = raw_cpuid_policy.xstate.raw[i].c & ecx_bits; To me, going to raw[].{a,b,c,d} looks like a backwards move, to be honest. Both this and the literal 3 above make it harder to locate all the places that need changing if a new bit (like xfd) is to be added. It would be better if grep-ing for an existing field name (say "xss") would easily turn up all involved places. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |