[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available



>>> On 17.08.18 at 10:59, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 17/08/2018 08:21, Jan Beulich wrote:
>> --- a/xen/include/asm-x86/asm_defns.h
>> +++ b/xen/include/asm-x86/asm_defns.h
>> @@ -186,6 +186,20 @@ void ret_from_intr(void);
>>          UNLIKELY_END_SECTION "\n"          \
>>          ".Llikely." #tag ".%=:"
>>  
>> +#define LINKONCE_PROLOGUE(sym)                    \
>> +        ".ifndef " sym() "\n\t"                   \
>> +        ".pushsection " sym(.gnu.linkonce.t.) "," \
> 
> This definitely warrants a comment and a change of name, seeing as sym
> isn't a symbol.  Its a macro which gives you a string back.

It's still a symbol name; don't forget that in the end section names
are symbols names too. I'd have gone with a better name if I had
any better idea. It's also not really clear to me what you want a
comment to say here.

>> @@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma
>>  
>>          va += xen_phys_start - XEN_VIRT_START;
>>      }
>> -    return (va & ma_va_bottom_mask) |
>> -           ((va << pfn_pdx_hole_shift) & ma_top_mask);
>> +
>> +#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
>> +#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
>> +    alternative_io("call " SYMNAME() "\n\t"
>> +                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
>> +                   "mov %[shift], %%ecx\n\t"
>> +                   "mov %[off], %[ma]\n\t"
>> +                   "and %[bmask], %[ma]\n\t"
>> +                   "shl %%cl, %[off]\n\t"
>> +                   "and %[tmask], %[off]\n\t"
>> +                   "or %[off], %[ma]\n\t"
>> +                   "ret\n\t"
>> +                   LINKONCE_EPILOGUE(SYMNAME),
>> +                   "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
> 
> The compiler understanding V doesn't imply that the assembler
> understands pdep

I've been considering this case, and in fact in an early version I had
a -DHAVE_AS_BMI2 addition. But I then decided to drop it, as I couldn't
really imagine V support to be backported to gcc pre-dating BMI2
support in gas (available as of 2.22, i.e. gcc 4.5 / 4.6 timeframe).
But yes, if you really think we need to cope with that, I could re-add
this. Just let me know if you indeed think this is needed.

>> +                   ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
>> +                   [mask] "m" (ma_real_mask),
>> +                   [shift] "m" (pfn_pdx_hole_shift),
>> +                   [bmask] "m" (ma_va_bottom_mask),
>> +                   [tmask] "m" (ma_top_mask)
>> +                   : "ecx");
>> +#undef SYMNAME
>> +#else
>> +    alternative_io("call do2ma",
>> +                   /* pdep ma_real_mask(%rip), %rdi, %rax */
>> +                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
>> +                   ".long ma_real_mask - 4 - .",
>> +                   X86_FEATURE_BMI2,
>> +                   ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
>> +                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
>> +#endif
> 
> This is a massive clobber list in a function you've forced always
> inline, and I can't see it doing nice things to the callsites.  TBH,
> this still feels over-complicated for what it wants to be.
> 
> Why not implement one single function in assembly that doesn't have
> usual C calling conventions and can clobber %ecx and one other, and use
> that?
> 
> It avoids the need for potentially 256 almost-identical copies of the
> function in the linkonce section, and avoids having the multiple
> implementations in C/asm, avoids the need for any logic derived from
> CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists.

Your response mixes things a bit too much for me to sort out what
exactly you're concerned about: The massive clobber list exists only
in the !CONFIG_INDIRECT_THUNK case. In that case though there
aren't going to be up to 225 instances of the function. I'd be fine
implementing the single one in assembly to reduce the clobber list,
it was just to keep down assembly code size and also to have
compiler generated code to compare against. I have to admit though
that I'm not overly concerned about the !CONFIG_INDIRECT_THUNK
case in the first place, so I also didn't see much reason to try to
optimize it.

For the CONFIG_INDIRECT_THUNK case, otoh, I'd really like to
avoid dictating register allocation to the compiler. Hence the solution
with the (possibly many) function instances. Overall code size still
decreases with this approach, and on modern hardware the entire
region of the image in which they live will remain cold. (Additionally
not affecting register allocation here has the benefit of making it
far easier to compare pre/post generated code.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.