[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
>>> On 17.08.18 at 10:59, <andrew.cooper3@xxxxxxxxxx> wrote: > On 17/08/2018 08:21, Jan Beulich wrote: >> --- a/xen/include/asm-x86/asm_defns.h >> +++ b/xen/include/asm-x86/asm_defns.h >> @@ -186,6 +186,20 @@ void ret_from_intr(void); >> UNLIKELY_END_SECTION "\n" \ >> ".Llikely." #tag ".%=:" >> >> +#define LINKONCE_PROLOGUE(sym) \ >> + ".ifndef " sym() "\n\t" \ >> + ".pushsection " sym(.gnu.linkonce.t.) "," \ > > This definitely warrants a comment and a change of name, seeing as sym > isn't a symbol. Its a macro which gives you a string back. It's still a symbol name; don't forget that in the end section names are symbols names too. I'd have gone with a better name if I had any better idea. It's also not really clear to me what you want a comment to say here. >> @@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma >> >> va += xen_phys_start - XEN_VIRT_START; >> } >> - return (va & ma_va_bottom_mask) | >> - ((va << pfn_pdx_hole_shift) & ma_top_mask); >> + >> +#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */ >> +#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]" >> + alternative_io("call " SYMNAME() "\n\t" >> + LINKONCE_PROLOGUE(SYMNAME) "\n\t" >> + "mov %[shift], %%ecx\n\t" >> + "mov %[off], %[ma]\n\t" >> + "and %[bmask], %[ma]\n\t" >> + "shl %%cl, %[off]\n\t" >> + "and %[tmask], %[off]\n\t" >> + "or %[off], %[ma]\n\t" >> + "ret\n\t" >> + LINKONCE_EPILOGUE(SYMNAME), >> + "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2, > > The compiler understanding V doesn't imply that the assembler > understands pdep I've been considering this case, and in fact in an early version I had a -DHAVE_AS_BMI2 addition. But I then decided to drop it, as I couldn't really imagine V support to be backported to gcc pre-dating BMI2 support in gas (available as of 2.22, i.e. gcc 4.5 / 4.6 timeframe). But yes, if you really think we need to cope with that, I could re-add this. Just let me know if you indeed think this is needed. >> + ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)), >> + [mask] "m" (ma_real_mask), >> + [shift] "m" (pfn_pdx_hole_shift), >> + [bmask] "m" (ma_va_bottom_mask), >> + [tmask] "m" (ma_top_mask) >> + : "ecx"); >> +#undef SYMNAME >> +#else >> + alternative_io("call do2ma", >> + /* pdep ma_real_mask(%rip), %rdi, %rax */ >> + ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t" >> + ".long ma_real_mask - 4 - .", >> + X86_FEATURE_BMI2, >> + ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask) >> + : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11"); >> +#endif > > This is a massive clobber list in a function you've forced always > inline, and I can't see it doing nice things to the callsites. TBH, > this still feels over-complicated for what it wants to be. > > Why not implement one single function in assembly that doesn't have > usual C calling conventions and can clobber %ecx and one other, and use > that? > > It avoids the need for potentially 256 almost-identical copies of the > function in the linkonce section, and avoids having the multiple > implementations in C/asm, avoids the need for any logic derived from > CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists. Your response mixes things a bit too much for me to sort out what exactly you're concerned about: The massive clobber list exists only in the !CONFIG_INDIRECT_THUNK case. In that case though there aren't going to be up to 225 instances of the function. I'd be fine implementing the single one in assembly to reduce the clobber list, it was just to keep down assembly code size and also to have compiler generated code to compare against. I have to admit though that I'm not overly concerned about the !CONFIG_INDIRECT_THUNK case in the first place, so I also didn't see much reason to try to optimize it. For the CONFIG_INDIRECT_THUNK case, otoh, I'd really like to avoid dictating register allocation to the compiler. Hence the solution with the (possibly many) function instances. Overall code size still decreases with this approach, and on modern hardware the entire region of the image in which they live will remain cold. (Additionally not affecting register allocation here has the benefit of making it far easier to compare pre/post generated code.) Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |