[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86: use POPCNT for hweight<N>() when available
On Wed, May 20, 2020 at 03:12:25PM +0200, Jan Beulich wrote: > On 20.05.2020 13:43, Roger Pau Monné wrote: > > On Wed, May 20, 2020 at 12:57:27PM +0200, Jan Beulich wrote: > >> On 20.05.2020 12:28, Roger Pau Monné wrote: > >>> On Wed, May 20, 2020 at 12:17:15PM +0200, Jan Beulich wrote: > >>>> On 20.05.2020 11:31, Roger Pau Monné wrote: > >>>>> On Wed, May 20, 2020 at 10:31:38AM +0200, Jan Beulich wrote: > >>>>>> On 14.05.2020 16:05, Roger Pau Monné wrote: > >>>>>>> On Mon, Jul 15, 2019 at 02:39:04PM +0000, Jan Beulich wrote: > >>>>>>>> @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c > >>>>>>>> efi/mkreloc: efi/mkreloc.c > >>>>>>>> $(HOSTCC) $(HOSTCFLAGS) -g -o $@ $< > >>>>>>>> > >>>>>>>> +nocov-y += hweight.o > >>>>>>>> +noubsan-y += hweight.o > >>>>>>>> +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 > >>>>>>>> 11,-ffixed-r$(reg)) > >>>>>>> > >>>>>>> Why not use clobbers in the asm to list the scratch registers? Is it > >>>>>>> that much expensive? > >>>>>> > >>>>>> The goal is to disturb the call sites as little as possible. There's > >>>>>> no point avoiding the scratch registers when no call is made (i.e. > >>>>>> when the POPCNT insn can be used). Taking away from the compiler 7 > >>>>>> out of 15 registers (that it can hold live data in) seems quite a > >>>>>> lot to me. > >>>>> > >>>>> IMO using -ffixed-reg for all those registers is even worse, as that > >>>>> prevents the generated code in hweight from using any of those, thus > >>>>> greatly limiting the amount of registers and likely making the > >>>>> generated code rely heavily on pushing an popping from the stack? > >>>> > >>>> Okay, that's the other side of the idea behind all this: Virtually no > >>>> hardware we run on will lack POPCNT support, hence the quality of > >>>> these fallback routines matters only the very old hardware, where we > >>>> likely don't perform optimally already anyway. > >>>> > >>>>> This also has the side effect to limiting the usage of popcnt to gcc, > >>>>> which IMO is also not ideal. > >>>> > >>>> Agreed. I don't know enough about clang to be able to think of > >>>> possible alternatives. In any event there's no change to current > >>>> behavior for hypervisors built with clang. > >>>> > >>>>> I also wondered, since the in-place asm before patching is a call > >>>>> instruction, wouldn't inline asm at build time already assume that the > >>>>> scratch registers are clobbered? > >>>> > >>>> That would imply the compiler peeks into the string literal of the > >>>> asm(). At least gcc doesn't, and even if it did it couldn't infer an > >>>> ABI from seeing a CALL insn. > >>> > >>> Please bear with me, but then I don't understand what Linux is doing > >>> in arch/x86/include/asm/arch_hweight.h. I see no clobbers there, > >>> neither it seems like the __sw_hweight{32/64} functions are built > >>> without the usage of the scratch registers. > >> > >> __sw_hweight{32,64} are implemented in assembly, avoiding most > >> scratch registers while pushing/popping the ones which do get > >> altered. > > > > Oh right, I was looking at lib/hweight.c instead of the arch one. > > > > Would you agree to use the no_caller_saved_registers attribute (which > > is available AFAICT for both gcc and clang) for generic_hweightXX and > > then remove the asm prefix code in favour of the defines for > > hweight{8/16}? > > At least for gcc no_caller_saved_registers isn't old enough to be > used unconditionally (nor is its companion -mgeneral-regs-only). > If you tell me it's fine to use unconditionally with clang, then > I can see about making this the preferred variant, with the > present one as a fallback. Hm, so my suggestion was bad, no_caller_saved_registers is only implemented starting clang 5, which is newer than the minimum we currently require (3.5). So apart from adding a clobber to the asm instance covering the scratch registers the only option I can see as viable is using a bunch of dummy global variables assigned to the registers we need to prevent the software generic_hweightXX functions from using, but that's ugly and will likely trigger warnings at least, since I'm not sure the compiler will find it safe to clobber a function call register with a global variable. Or adding a prolegue / epilogue to the call instruction in order to push / pop the relevant registers. None seems like a very good option IMO. I also assume that using no_caller_saved_registers when available or else keeping the current behavior is not an acceptable solution? FWIW, from a FreeBSD PoV I would be OK with that, as I don't think there are any supported targets with clang < 5. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |