[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] x86/xsave: prefer eager clearing of state over eager restoring



>>> On 16.08.18 at 11:07, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 22/06/2018 11:57, Jan Beulich wrote:
>> --- a/xen/arch/x86/spec_ctrl.c
>> +++ b/xen/arch/x86/spec_ctrl.c
>> @@ -616,7 +616,7 @@ void __init init_speculation_mitigations
>>  
>>      /* Check whether Eager FPU should be enabled by default. */
>>      if ( opt_eager_fpu == -1 )
>> -        opt_eager_fpu = should_use_eager_fpu();
>> +        opt_eager_fpu = !cpu_has_xsave && should_use_eager_fpu();
> 
> I'd not spotted this the first time round.
> 
> Intel is very clear that, if you're using xsave, you should be using
> eager FPU.  Therefore, this goes specifically against the advice in the
> ORM, and the advise we were given during the LazyFPU timeframe.
> 
> Furthermore we (XenServer) and customers have seen a reliable perf
> improvement from the LazyFPU security fix, up to 8% in places, for
> normal VDI and server workloads.  As I said during the development the
> LazyFPU fixes, this is almost certainly down to the fact that all code
> uses the FPU these days.

Well - as said in the description, observation in my tests (which are
not a typical server workload) were that about 50% of the context
switches were no followed by a (lazy) restore, until the vCPU was
de-scheduled again.

The change as presented is in fact trying to move to a middle ground,
in that it doesn't leave stale state in the registers anymore, but
instead frees the underlying physical ones up for other uses (by
putting the state components into init state).

> I'm still waiting on a more formal statement from AMD, and don't yet
> have any perf numbers on their hardware.
> 
> However, as we will definitely get an extra perf boost from fully
> deleting the remaining lazy paths (no more clts/stts in the context
> switch path), my gut feeing is that there is going to have to be some
> terrible chronic case on AMD for for us to consider not switching to
> fully eager.

Yes, eliminating in particular the stts() is certainly going to help
performance. With ever growing state sizes I'm not convinced though
that in the long run (and even already with AVX-512, with its well over
2k of state) the CR0 access is indeed (going to remain) worse than the
(perhaps unnecessary) state load.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.