|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 3/4] Arm64: further speed-up to hweight{32, 64}()
>>> On 04.06.19 at 18:11, <julien.grall@xxxxxxx> wrote:
> On 5/31/19 10:53 AM, Jan Beulich wrote:
>> According to Linux commit e75bef2a4f ("arm64: Select
>> ARCH_HAS_FAST_MULTIPLIER") this is a further improvement over the
>> variant using only bitwise operations on at least some hardware, and no
>> worse on other.
>>
>> Suggested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> RFC: To be honest I'm not fully convinced this is a win in particular in
>> the hweight32() case, as there's no actual shift insn which gets
>> replaced by the multiplication. Even for hweight64() the compiler
>> could emit better code and avoid the explicit shift by 32 (which it
>> emits at least for me).
>
> I can see multiplication instruction used in both hweight32() and
> hweight64() with the compiler I am using.
That is for which exact implementation? What I was referring to as
"could emit better code" was the multiplication-free variant, where
the compiler fails to recognize (afaict) another opportunity to fold
a shift into an arithmetic instruction:
add x0, x0, x0, lsr #4
and x0, x0, #0xf0f0f0f0f0f0f0f
add x0, x0, x0, lsr #8
add x0, x0, x0, lsr #16
>>> lsr x1, x0, #32
>>> add w0, w1, w0
and w0, w0, #0xff
ret
Afaict the two marked insns could be replaced by
add x0, x0, x0, lsr #32
With there only a sequence of add-s remaining, I'm having
difficulty seeing how the use of mul+lsr would actually help:
add x0, x0, x0, lsr #4
and x0, x0, #0xf0f0f0f0f0f0f0f
mov x1, #0x101010101010101
mul x0, x0, x1
lsr x0, x0, #56
ret
But of course I know nothing about throughput and latency
of such add-s with one of their operands shifted first. And
yes, the variant using mul is, comparing with the better
optimized case, still one insn smaller.
> I would expect the compiler could easily replace a multiply by a series
> of shift but it would be more difficult to do the invert.
>
> Also, this has been in Linux for a year now, so I am assuming Linux
> folks are happy with changes (CCing Robin just in case I missed
> anything). Therefore I am happy to give it a go on Xen as well.
In which case - can I take this as an ack, or do you want to first
pursue the discussion?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |