[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86: use POPCNT for hweight<N>() when available

On 20.05.2020 11:31, Roger Pau Monné wrote:
> On Wed, May 20, 2020 at 10:31:38AM +0200, Jan Beulich wrote:
>> On 14.05.2020 16:05, Roger Pau Monné wrote:
>>> On Mon, Jul 15, 2019 at 02:39:04PM +0000, Jan Beulich wrote:
>>>> @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c
>>>>   efi/mkreloc: efi/mkreloc.c
>>>>    $(HOSTCC) $(HOSTCFLAGS) -g -o $@ $<
>>>> +nocov-y += hweight.o
>>>> +noubsan-y += hweight.o
>>>> +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 11,-ffixed-r$(reg))
>>> Why not use clobbers in the asm to list the scratch registers? Is it
>>> that much expensive?
>> The goal is to disturb the call sites as little as possible. There's
>> no point avoiding the scratch registers when no call is made (i.e.
>> when the POPCNT insn can be used). Taking away from the compiler 7
>> out of 15 registers (that it can hold live data in) seems quite a
>> lot to me.
> IMO using -ffixed-reg for all those registers is even worse, as that
> prevents the generated code in hweight from using any of those, thus
> greatly limiting the amount of registers and likely making the
> generated code rely heavily on pushing an popping from the stack?

Okay, that's the other side of the idea behind all this: Virtually no
hardware we run on will lack POPCNT support, hence the quality of
these fallback routines matters only the very old hardware, where we
likely don't perform optimally already anyway.

> This also has the side effect to limiting the usage of popcnt to gcc,
> which IMO is also not ideal.

Agreed. I don't know enough about clang to be able to think of
possible alternatives. In any event there's no change to current
behavior for hypervisors built with clang.

> I also wondered, since the in-place asm before patching is a call
> instruction, wouldn't inline asm at build time already assume that the
> scratch registers are clobbered?

That would imply the compiler peeks into the string literal of the
asm(). At least gcc doesn't, and even if it did it couldn't infer an
ABI from seeing a CALL insn.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.