[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] x86: improve MSR_SHADOW_GS accesses
On 06/12/17 16:37, Jan Beulich wrote: > --- a/xen/include/asm-x86/msr.h > +++ b/xen/include/asm-x86/msr.h > @@ -8,6 +8,7 @@ > #include <xen/types.h> > #include <xen/percpu.h> > #include <xen/errno.h> > +#include <asm/alternative.h> > #include <asm/asm_defns.h> > #include <asm/cpufeature.h> > > @@ -172,6 +173,24 @@ static inline unsigned long rdgsbase(voi > return base; > } > > +static inline unsigned long rdgsshadow(void) > +{ > + unsigned long base; > + > + alternative_io("mov %[msr], %%ecx\n\t" > + "rdmsr\n\t" > + "shl $32, %%rdx\n\t" > + "or %%rdx, %[res]", There needs to be some clearer distinction between the two alternatives. It took a while for me to spot this comma. > + "swapgs\n\t" > + ".byte 0xf3, 0x48, 0x0f, 0xae, 0xc8\n\t" /* rdgsbase rax > */ > + "swapgs", > + X86_FEATURE_FSGSBASE, > + [res] "=&a" (base), > + [msr] "i" (MSR_SHADOW_GS_BASE) : "rcx", "rdx"); MSR should be a "c" constraint, and the clobber dropped. It will result in better code in most of the callsites, by avoiding a reload of ecx, and merging of the higher constant with the other MSR accesses. I'm not entirely sure the alternative is justified here. For consistency alone, these helpers should match their companions, and in the unlikely case that the runtime feature test does make a measurable difference, wouldn't a static key be a better option anyway? > + > + return base; > +} > + > static inline void wrfsbase(unsigned long base) > { > if ( cpu_has_fsgsbase ) > @@ -196,6 +215,19 @@ static inline void wrgsbase(unsigned lon > wrmsrl(MSR_GS_BASE, base); > } > > +static inline void wrgsshadow(unsigned long base) > +{ > + alternative_input("mov %[msr], %%ecx\n\t" > + "shld $32, %[val], %%rdx\n\t" This is a vector path instruction and specifically called out to be avoided in the AMD optimisation guide. On all hardware (according to Agner's latency mesurements) it alone has a longer latency to execute that the mov/shl pair you've replaced, and that is before accounting for mov elimination. ~Andrew > + "wrmsr", > + "swapgs\n\t" > + ".byte 0xf3, 0x48, 0x0f, 0xae, 0xd8\n\t" /* wrgsbase > rax */ > + "swapgs", > + X86_FEATURE_FSGSBASE, > + [val] "a" (base), > + [msr] "i" (MSR_SHADOW_GS_BASE) : "rcx", "rdx"); > +} > + > DECLARE_PER_CPU(u64, efer); > u64 read_efer(void); > void write_efer(u64 val); > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxxx > https://lists.xenproject.org/mailman/listinfo/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |