|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/bitops: Optimise arch_ffs{,l}() some more on AMD
On 01.09.2025 16:21, Andrew Cooper wrote:
> On 27/08/2025 8:52 am, Jan Beulich wrote:
>> On 26.08.2025 19:41, Andrew Cooper wrote:
>>> --- a/xen/common/bitops.c
>>> +++ b/xen/common/bitops.c
>>> @@ -97,14 +97,14 @@ static void __init test_for_each_set_bit(void)
>>> if ( ui != ui_res )
>>> panic("for_each_set_bit(uint) expected %#x, got %#x\n", ui,
>>> ui_res);
>>>
>>> - ul = HIDE(1UL << (BITS_PER_LONG - 1) | 1);
>>> + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11);
>>> for_each_set_bit ( i, ul )
>>> ul_res |= 1UL << i;
>>>
>>> if ( ul != ul_res )
>>> panic("for_each_set_bit(ulong) expected %#lx, got %#lx\n", ul,
>>> ul_res);
>>>
>>> - ull = HIDE(0x8000000180000001ULL);
>>> + ull = HIDE(0x8000000180000011ULL);
>>> for_each_set_bit ( i, ull )
>>> ull_res |= 1ULL << i;
>> How do these changes make a difference? Apart from ffs() using TZCNT, ...
>>
>>> @@ -127,6 +127,79 @@ static void __init test_for_each_set_bit(void)
>>> panic("for_each_set_bit(break) expected 0x1008, got %#x\n",
>>> ui_res);
>>> }
>>>
>>> +/*
>>> + * A type-generic fls() which picks the appropriate fls{,l,64}() based on
>>> it's
>>> + * argument.
>>> + */
>>> +#define fls_g(x) \
>>> + (sizeof(x) <= sizeof(int) ? fls(x) : \
>>> + sizeof(x) <= sizeof(long) ? flsl(x) : \
>>> + sizeof(x) <= sizeof(uint64_t) ? fls64(x) : \
>>> + ({ BUILD_ERROR("fls_g() Bad input type"); 0; }))
>>> +
>>> +/*
>>> + * for_each_set_bit_reverse() - Iterate over all set bits in a scalar
>>> value,
>>> + * from MSB to LSB.
>>> + *
>>> + * @iter An iterator name. Scoped is within the loop only.
>>> + * @val A scalar value to iterate over.
>>> + *
>>> + * A copy of @val is taken internally.
>>> + */
>>> +#define for_each_set_bit_reverse(iter, val) \
>>> + for ( typeof(val) __v = (val); __v; __v = 0 ) \
>>> + for ( unsigned int (iter); \
>>> + __v && ((iter) = fls_g(__v) - 1, true); \
>>> + __clear_bit(iter, &__v) )
>>> +
>>> +/*
>>> + * Xen doesn't have need of for_each_set_bit_reverse() at present, but the
>>> + * construct does exercise a case of arch_fls*() not covered anywhere else
>>> by
>>> + * these tests.
>>> + */
>>> +static void __init test_for_each_set_bit_reverse(void)
>>> +{
>>> + unsigned int ui, ui_res = 0, tmp;
>>> + unsigned long ul, ul_res = 0;
>>> + uint64_t ull, ull_res = 0;
>>> +
>>> + ui = HIDE(0x80008001U);
>>> + for_each_set_bit_reverse ( i, ui )
>>> + ui_res |= 1U << i;
>>> +
>>> + if ( ui != ui_res )
>>> + panic("for_each_set_bit_reverse(uint) expected %#x, got %#x\n",
>>> ui, ui_res);
>>> +
>>> + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11);
>>> + for_each_set_bit_reverse ( i, ul )
>>> + ul_res |= 1UL << i;
>>> +
>>> + if ( ul != ul_res )
>>> + panic("for_each_set_bit_reverse(ulong) expected %#lx, got %#lx\n",
>>> ul, ul_res);
>>> +
>>> + ull = HIDE(0x8000000180000011ULL);
>>> + for_each_set_bit_reverse ( i, ull )
>>> + ull_res |= 1ULL << i;
>> ... even here the need for the extra setting of bit 4 remains unclear to
>> me: The thing that was missing was the testing of the reverse for-each.
>> You mention the need for an asymmetric input in the description, but isn't
>> that covered already by the first test using 0x80008001U?
>
> The first test covers {arch_,}f[fl]s() only. It happens to be safe to
> count-from-the-wrong-end bugs, but that was by chance.
>
> The second test covers {arch_,}f[fs]sl() only. They are unsafe WRT
> symmetry, and disjoint (coverage wise) from the first test.
>
> The third test, while the same as the second test on x86, isn't the same
> on arm32.
>
>
> Just because one test happens to be safe (symmetry wise) and passes,
> doesn't make the other variants tested.
Hmm, okay, it is of course in principle possible that one flavor is screwed
while the other isn't.
Acked-by: Jan Beulich <jbeulich@xxxxxxxx>
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |