|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 11/11] x86/bitops: Use the POPCNT instruction when available
On 29.08.2024 00:03, Andrew Cooper wrote:
> It has existed in x86 CPUs since 2008, so we're only 16 years late adding
> support. With all the other scafolding in place, implement arch_hweightl()
> for x86.
>
> The only complication is that the call to arch_generic_hweightl() is behind
> the compilers back. Address this by writing it in ASM and ensure that it
> preserves all registers.
>
> Copy the code generation from generic_hweightl(). It's not a complicated
> algorithm, and is easy to regenerate if needs be, but cover it with the same
> unit tests as test_generic_hweightl() just for piece of mind.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>
> v2:
> * Fix MISRA 8.2 (parameter name) and 8.5 (single declaration) regressions.
> * Rename {arch->x86}-generic-hweightl.{S->c}
> * Adjust ASM formating
>
> The __constructor trick to cause any reference to $foo() to pull in
> test_$foo() only works when both are in the same TU. i.e. what I did in
> v1 (putting test_arch_generic_hweightl() in the regular generic-hweightl.c)
> didn't work.
I'm afraid I don't understand this. What exactly didn't work, breaking in which
way? Presumably as much as you, I don't really like the global asm() in a C
file, when ideally the same could be written with less clutter in an assembly
one.
> This in turn means that arch_generic_hweightl() needs writing in a global asm
> block, and also that we can't use FUNC()/END(). While we could adjust it to
> work for GCC/binutils, we can't have CPP macros in Clang-IAS strings.
What does Clang different from gcc there? I was hoping that at least their pre-
processors would work in (sufficiently) similar ways.
> --- /dev/null
> +++ b/xen/lib/x86-generic-hweightl.c
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <xen/bitops.h>
> +#include <xen/init.h>
> +#include <xen/self-tests.h>
> +
> +/*
> + * An implementation of generic_hweightl() used on hardware without the
> POPCNT
> + * instruction.
> + *
> + * This function is called from within an ALTERNATIVE in arch_hweightl().
> + * i.e. behind the back of the compiler. Therefore all registers are callee
> + * preserved.
> + *
> + * The ASM is what GCC-12 emits for generic_hweightl() in a release build of
> + * Xen, with spilling of %rdi/%rdx to preserve the callers registers.
> + *
> + * Note: When we can use __attribute__((no_caller_saved_registers))
> + * unconditionally (GCC 7, Clang 5), we can implement this in plain C.
> + */
> +asm (
> + ".type arch_generic_hweightl, STT_FUNC\n\t"
> + ".globl arch_generic_hweightl\n\t"
> + ".hidden arch_generic_hweightl\n\t"
> + ".balign " STR(CONFIG_FUNCTION_ALIGNMENT) ", 0x90\n\t"
Maybe better avoid open-coding CODE_FILL, in case we want to change that
down the road?
Also could I talk you into dropping the \t there? Canonical assembly code
wants ...
> + "arch_generic_hweightl:\n\t"
.. labels unindented.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |