[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86: use POPCNT for hweight<N>() when available
On 20.05.2020 12:28, Roger Pau Monné wrote: > On Wed, May 20, 2020 at 12:17:15PM +0200, Jan Beulich wrote: >> On 20.05.2020 11:31, Roger Pau Monné wrote: >>> On Wed, May 20, 2020 at 10:31:38AM +0200, Jan Beulich wrote: >>>> On 14.05.2020 16:05, Roger Pau Monné wrote: >>>>> On Mon, Jul 15, 2019 at 02:39:04PM +0000, Jan Beulich wrote: >>>>>> @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c >>>>>> efi/mkreloc: efi/mkreloc.c >>>>>> $(HOSTCC) $(HOSTCFLAGS) -g -o $@ $< >>>>>> >>>>>> +nocov-y += hweight.o >>>>>> +noubsan-y += hweight.o >>>>>> +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 11,-ffixed-r$(reg)) >>>>> >>>>> Why not use clobbers in the asm to list the scratch registers? Is it >>>>> that much expensive? >>>> >>>> The goal is to disturb the call sites as little as possible. There's >>>> no point avoiding the scratch registers when no call is made (i.e. >>>> when the POPCNT insn can be used). Taking away from the compiler 7 >>>> out of 15 registers (that it can hold live data in) seems quite a >>>> lot to me. >>> >>> IMO using -ffixed-reg for all those registers is even worse, as that >>> prevents the generated code in hweight from using any of those, thus >>> greatly limiting the amount of registers and likely making the >>> generated code rely heavily on pushing an popping from the stack? >> >> Okay, that's the other side of the idea behind all this: Virtually no >> hardware we run on will lack POPCNT support, hence the quality of >> these fallback routines matters only the very old hardware, where we >> likely don't perform optimally already anyway. >> >>> This also has the side effect to limiting the usage of popcnt to gcc, >>> which IMO is also not ideal. >> >> Agreed. I don't know enough about clang to be able to think of >> possible alternatives. In any event there's no change to current >> behavior for hypervisors built with clang. >> >>> I also wondered, since the in-place asm before patching is a call >>> instruction, wouldn't inline asm at build time already assume that the >>> scratch registers are clobbered? >> >> That would imply the compiler peeks into the string literal of the >> asm(). At least gcc doesn't, and even if it did it couldn't infer an >> ABI from seeing a CALL insn. > > Please bear with me, but then I don't understand what Linux is doing > in arch/x86/include/asm/arch_hweight.h. I see no clobbers there, > neither it seems like the __sw_hweight{32/64} functions are built > without the usage of the scratch registers. __sw_hweight{32,64} are implemented in assembly, avoiding most scratch registers while pushing/popping the ones which do get altered. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |