[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86: use POPCNT for hweight<N>() when available


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 20 May 2020 13:43:27 +0200
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Wed, 20 May 2020 11:43:52 +0000
  • Ironport-sdr: lRKbqWturC4hsG5QIz48e3pbeVgDPAqfaFvxF3Xc+AfoVXNnyc6EoQQhvbG40da9FiKoCSC6ou gyQdlm+sS9L5M3zFXz1OIlKN6iw/72P2XWFpLup74WDTMeIoRLi3pj/6SGkpwTgEVp/D9VGsKj mE7osdQMX0Sn+QyaYkWq9YiaiVIL5esfWZrexNkAOlowev4U33o/cjtXyHVAl1Z45iK4ZG0y2H j554uMjk4xZI3BP52FuTtLy6sVIJfN9Td95kWRsdhA78WYuwg+zNkOPFIDTryFRjt/Vwnf62nA vLQ=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 20, 2020 at 12:57:27PM +0200, Jan Beulich wrote:
> On 20.05.2020 12:28, Roger Pau Monné wrote:
> > On Wed, May 20, 2020 at 12:17:15PM +0200, Jan Beulich wrote:
> >> On 20.05.2020 11:31, Roger Pau Monné wrote:
> >>> On Wed, May 20, 2020 at 10:31:38AM +0200, Jan Beulich wrote:
> >>>> On 14.05.2020 16:05, Roger Pau Monné wrote:
> >>>>> On Mon, Jul 15, 2019 at 02:39:04PM +0000, Jan Beulich wrote:
> >>>>>> @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c
> >>>>>>   efi/mkreloc: efi/mkreloc.c
> >>>>>>        $(HOSTCC) $(HOSTCFLAGS) -g -o $@ $<
> >>>>>>   
> >>>>>> +nocov-y += hweight.o
> >>>>>> +noubsan-y += hweight.o
> >>>>>> +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 11,-ffixed-r$(reg))
> >>>>>
> >>>>> Why not use clobbers in the asm to list the scratch registers? Is it
> >>>>> that much expensive?
> >>>>
> >>>> The goal is to disturb the call sites as little as possible. There's
> >>>> no point avoiding the scratch registers when no call is made (i.e.
> >>>> when the POPCNT insn can be used). Taking away from the compiler 7
> >>>> out of 15 registers (that it can hold live data in) seems quite a
> >>>> lot to me.
> >>>
> >>> IMO using -ffixed-reg for all those registers is even worse, as that
> >>> prevents the generated code in hweight from using any of those, thus
> >>> greatly limiting the amount of registers and likely making the
> >>> generated code rely heavily on pushing an popping from the stack?
> >>
> >> Okay, that's the other side of the idea behind all this: Virtually no
> >> hardware we run on will lack POPCNT support, hence the quality of
> >> these fallback routines matters only the very old hardware, where we
> >> likely don't perform optimally already anyway.
> >>
> >>> This also has the side effect to limiting the usage of popcnt to gcc,
> >>> which IMO is also not ideal.
> >>
> >> Agreed. I don't know enough about clang to be able to think of
> >> possible alternatives. In any event there's no change to current
> >> behavior for hypervisors built with clang.
> >>
> >>> I also wondered, since the in-place asm before patching is a call
> >>> instruction, wouldn't inline asm at build time already assume that the
> >>> scratch registers are clobbered?
> >>
> >> That would imply the compiler peeks into the string literal of the
> >> asm(). At least gcc doesn't, and even if it did it couldn't infer an
> >> ABI from seeing a CALL insn.
> > 
> > Please bear with me, but then I don't understand what Linux is doing
> > in arch/x86/include/asm/arch_hweight.h. I see no clobbers there,
> > neither it seems like the __sw_hweight{32/64} functions are built
> > without the usage of the scratch registers.
> 
> __sw_hweight{32,64} are implemented in assembly, avoiding most
> scratch registers while pushing/popping the ones which do get
> altered.

Oh right, I was looking at lib/hweight.c instead of the arch one.

Would you agree to use the no_caller_saved_registers attribute (which
is available AFAICT for both gcc and clang) for generic_hweightXX and
then remove the asm prefix code in favour of the defines for
hweight{8/16}?

I think that would make it easier to read.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.