[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 9/9] x86/bitops: Use the POPCNT instruction when available
On 23.08.2024 01:06, Andrew Cooper wrote: > A few RFC points. > > * I throught we had an x86 general lib-y but I can't find one, hence why it's > still in xen/lib/ for now. We indeed have nothing like that yet. The file name should then imo not be arch-* though, but x86-*. Or you could put it in xen/lib/x86/. It could even be obj-y rather than lib-y, unless you expect we'll be able to get rid of all uses, which seems unlikely at least due to bitmap_weight(). Otoh with my ABI-level series the call site in arch_hweightl() could then be made go away for v2 and above, at which point it living in lib-y will be preferable. > * When we up the minimum toolchain to GCC 7 / Clang 5, we can use a > __attribute__((no_caller_saved_registers)) and can forgo writing this in > asm. > > GCC seems to need extra help, and wants -mgeneral-regs-only too. It has a > habit of complaining about incompatible instructions even when it's not > emitting them. What is this part about? What incompatible instructions, in particular? > @@ -475,4 +476,24 @@ static always_inline unsigned int arch_flsl(unsigned > long x) > } > #define arch_flsl arch_flsl > > +static always_inline unsigned int arch_hweightl(unsigned long x) > +{ > + unsigned int r; > + > + /* > + * arch_generic_hweightl() is written in ASM in order to preserve all > + * registers, as the compiler can't see the call. > + * > + * This limits the POPCNT instruction to using the same ABI as a function > + * call (input in %rdi, output in %eax) but that's fine. > + */ > + alternative_io("call arch_generic_hweightl", > + "popcnt %[val], %q[res]", X86_FEATURE_POPCNT, > + ASM_OUTPUT2([res] "=a" (r) ASM_CALL_CONSTRAINT), > + [val] "D" (x)); If you made [val] an output ("+D") you could avoid preserving the register in the function. And I'd expect the register wouldn't be re-used often afterwards, so its clobbering likely won't harm code quality very much. > --- /dev/null > +++ b/xen/lib/arch-generic-hweightl.S > @@ -0,0 +1,46 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > + > + .file __FILE__ > + > +#include <xen/linkage.h> > + > +/* > + * An implementation of generic_hweightl() used on hardware without the > POPCNT > + * instruction. > + * > + * This function is called from within an ALTERNATIVE in arch_hweightl(). > + * i.e. behind the back of the compiler. Therefore all registers are callee > + * preserved. > + * > + * The ASM is what GCC-12 emits for generic_hweightl() in a release build of > + * Xen, with spilling of %rdi/%rdx to preserve the callers registers. > + */ > +FUNC(arch_generic_hweightl) > + push %rdi > + push %rdx > + > + movabs $0x5555555555555555, %rdx > + mov %rdi, %rax > + shr $1, %rax > + and %rdx, %rax > + sub %rax, %rdi > + movabs $0x3333333333333333, %rax > + mov %rdi, %rdx > + shr $0x2, %rdi Could I talk you into omitting the 0x here and ... > + and %rax, %rdx > + and %rax, %rdi > + add %rdi, %rdx > + mov %rdx, %rax > + shr $0x4, %rax ... here, and maybe use ... > + add %rdx, %rax > + movabs $0xf0f0f0f0f0f0f0f, %rdx > + and %rdx, %rax > + movabs $0x101010101010101, %rdx > + imul %rdx, %rax > + shr $0x38, %rax ... 56 here (or even BITS_PER_LONG-8)? Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |