[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
On 17/08/18 10:38, Jan Beulich wrote: >>>> On 17.08.18 at 10:59, <andrew.cooper3@xxxxxxxxxx> wrote: >> On 17/08/2018 08:21, Jan Beulich wrote: >>> --- a/xen/include/asm-x86/asm_defns.h >>> +++ b/xen/include/asm-x86/asm_defns.h >>> @@ -186,6 +186,20 @@ void ret_from_intr(void); >>> UNLIKELY_END_SECTION "\n" \ >>> ".Llikely." #tag ".%=:" >>> >>> +#define LINKONCE_PROLOGUE(sym) \ >>> + ".ifndef " sym() "\n\t" \ >>> + ".pushsection " sym(.gnu.linkonce.t.) "," \ >> This definitely warrants a comment and a change of name, seeing as sym >> isn't a symbol. Its a macro which gives you a string back. > It's still a symbol name; don't forget that in the end section names > are symbols names too. I'd have gone with a better name if I had > any better idea. It's also not really clear to me what you want a > comment to say here. /* sym must be a macro which takes an optional symbol name prefix */ ? A user of this macro, knowing only the prototype would expect to use by passing a string literal. > >>> @@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma >>> >>> va += xen_phys_start - XEN_VIRT_START; >>> } >>> - return (va & ma_va_bottom_mask) | >>> - ((va << pfn_pdx_hole_shift) & ma_top_mask); >>> + >>> +#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */ >>> +#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]" >>> + alternative_io("call " SYMNAME() "\n\t" >>> + LINKONCE_PROLOGUE(SYMNAME) "\n\t" >>> + "mov %[shift], %%ecx\n\t" >>> + "mov %[off], %[ma]\n\t" >>> + "and %[bmask], %[ma]\n\t" >>> + "shl %%cl, %[off]\n\t" >>> + "and %[tmask], %[off]\n\t" >>> + "or %[off], %[ma]\n\t" >>> + "ret\n\t" >>> + LINKONCE_EPILOGUE(SYMNAME), >>> + "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2, >> The compiler understanding V doesn't imply that the assembler >> understands pdep > I've been considering this case, and in fact in an early version I had > a -DHAVE_AS_BMI2 addition. But I then decided to drop it, as I couldn't > really imagine V support to be backported to gcc pre-dating BMI2 > support in gas (available as of 2.22, i.e. gcc 4.5 / 4.6 timeframe). > But yes, if you really think we need to cope with that, I could re-add > this. Just let me know if you indeed think this is needed. Didn't you say that you've got Retpoline backported to GCC 4.3 ? Independently of my argument against using V below, I'm not overly fussed especially as we are considering upping the requirements to something a little more modern, so long as we accept that we may have to retrofit a -DHAVE_AS_BMI2 after the fact. > >>> + ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)), >>> + [mask] "m" (ma_real_mask), >>> + [shift] "m" (pfn_pdx_hole_shift), >>> + [bmask] "m" (ma_va_bottom_mask), >>> + [tmask] "m" (ma_top_mask) >>> + : "ecx"); >>> +#undef SYMNAME >>> +#else >>> + alternative_io("call do2ma", >>> + /* pdep ma_real_mask(%rip), %rdi, %rax */ >>> + ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t" >>> + ".long ma_real_mask - 4 - .", >>> + X86_FEATURE_BMI2, >>> + ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask) >>> + : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11"); >>> +#endif >> This is a massive clobber list in a function you've forced always >> inline, and I can't see it doing nice things to the callsites. TBH, >> this still feels over-complicated for what it wants to be. >> >> Why not implement one single function in assembly that doesn't have >> usual C calling conventions and can clobber %ecx and one other, and use >> that? >> >> It avoids the need for potentially 256 almost-identical copies of the >> function in the linkonce section, and avoids having the multiple >> implementations in C/asm, avoids the need for any logic derived from >> CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists. > Your response mixes things a bit too much for me to sort out what > exactly you're concerned about: The massive clobber list exists only > in the !CONFIG_INDIRECT_THUNK case. In that case though there > aren't going to be up to 225 instances of the function. I'd be fine > implementing the single one in assembly to reduce the clobber list, > it was just to keep down assembly code size and also to have > compiler generated code to compare against. I have to admit though > that I'm not overly concerned about the !CONFIG_INDIRECT_THUNK > case in the first place, so I also didn't see much reason to try to > optimize it. > > For the CONFIG_INDIRECT_THUNK case, otoh, I'd really like to > avoid dictating register allocation to the compiler. Hence the solution > with the (possibly many) function instances. Overall code size still > decreases with this approach, and on modern hardware the entire > region of the image in which they live will remain cold. (Additionally > not affecting register allocation here has the benefit of making it > far easier to compare pre/post generated code.) How many passes through the hypervisor hit two or more of these functions? I appreciate that for development, reducing the register perturbance can be nice for diffing the resulting disassembly, but when it comes to actually running the code, register renames at compile time are free, whereas pulling multiple almost-identical copies of the stub into the I-cache is not. Implementing this as a single function looks to be simpler in terms of the change, will compile to a smaller result, and will run faster. It seems to be a win-win-win overall. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |