[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86: use POPCNT for hweight<N>() when available



On 14.05.2020 16:05, Roger Pau Monné wrote:
> On Mon, Jul 15, 2019 at 02:39:04PM +0000, Jan Beulich wrote:
>> @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c
>>   efi/mkreloc: efi/mkreloc.c
>>      $(HOSTCC) $(HOSTCFLAGS) -g -o $@ $<
>>   
>> +nocov-y += hweight.o
>> +noubsan-y += hweight.o
>> +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 11,-ffixed-r$(reg))
> 
> Why not use clobbers in the asm to list the scratch registers? Is it
> that much expensive?

The goal is to disturb the call sites as little as possible. There's
no point avoiding the scratch registers when no call is made (i.e.
when the POPCNT insn can be used). Taking away from the compiler 7
out of 15 registers (that it can hold live data in) seems quite a
lot to me.

>> --- a/xen/include/asm-x86/bitops.h
>> +++ b/xen/include/asm-x86/bitops.h
>> @@ -475,9 +475,36 @@ static inline int fls(unsigned int x)
>>    *
>>    * The Hamming Weight of a number is the total number of bits set in it.
>>    */
>> +#ifndef __clang__
>> +/* POPCNT encodings with %{r,e}di input and %{r,e}ax output: */
>> +#define POPCNT_64 ".byte 0xF3, 0x48, 0x0F, 0xB8, 0xC7"
>> +#define POPCNT_32 ".byte 0xF3, 0x0F, 0xB8, 0xC7"
>> +
>> +#define hweight_(n, x, insn, setup, cout, cin) ({                         \
>> +    unsigned int res_;                                                    \
>> +    /*                                                                    \
>> +     * For the function call the POPCNT input register needs to be marked \
>> +     * modified as well. Set up a local variable of appropriate type      \
>> +     * for this purpose.                                                  \
>> +     */                                                                   \
>> +    typeof((uint##n##_t)(x) + 0U) val_ = (x);                             \
>> +    alternative_io(setup "; call generic_hweight" #n,                     \
>> +                   insn, X86_FEATURE_POPCNT,                              \
>> +                   ASM_OUTPUT2([res] "=a" (res_), [val] cout (val_)),     \
>> +                   [src] cin (val_));                                     \
>> +    res_;                                                                 \
>> +})
>> +#define hweight64(x) hweight_(64, x, POPCNT_64, "", "+D", "g")
>> +#define hweight32(x) hweight_(32, x, POPCNT_32, "", "+D", "g")
>> +#define hweight16(x) hweight_(16, x, "movzwl %w[src], %[val]; " POPCNT_32, \
>> +                              "mov %[src], %[val]", "=&D", "rm")
>> +#define hweight8(x)  hweight_( 8, x, "movzbl %b[src], %[val]; " POPCNT_32, \
>> +                              "mov %[src], %[val]", "=&D", "rm")
> 
> Why not just convert types < 32bits into uint32_t and avoid the asm
> prefix? You are already converting them in hweight_ to an uintX_t.

I don't think I do - there's a conversion to the promoted type
when adding in an unsigned int value (which is there to retain
uint64_t for the 64-bit case, while using unsigned int for all
smaller sizes, as per the parameter types of generic_hweight*()).

> That would made the asm simpler as you would then only need to make
> sure the input is in %rdi and the output is fetched from %rax?

That's an option, yes, but I again wanted to limit the impact to
generated code (including code size) as much as possible.
generic_hweight{8,16}() take unsigned int arguments, not
uint{8,16}_t ones. Hence at the call site (when a function call
is needed) no zero extension is necessary. Therefore the MOVZ
above is to (mainly) overlay the MOV during patching, while the
POPCNT is to (mainly) overlay the CALL.

If the simpler asm()-s were considered more important than the
quality of the generated code, I think we could simply have

#define hweight16(x) hweight32((uint16_t)(x))
#define hweight8(x)  hweight32(( uint8_t)(x))

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.