[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/bitops: Optimise arch_ffs{,l}() some more on AMD


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 1 Sep 2025 16:26:00 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 01 Sep 2025 14:26:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01.09.2025 16:21, Andrew Cooper wrote:
> On 27/08/2025 8:52 am, Jan Beulich wrote:
>> On 26.08.2025 19:41, Andrew Cooper wrote:
>>> --- a/xen/common/bitops.c
>>> +++ b/xen/common/bitops.c
>>> @@ -97,14 +97,14 @@ static void __init test_for_each_set_bit(void)
>>>      if ( ui != ui_res )
>>>          panic("for_each_set_bit(uint) expected %#x, got %#x\n", ui, 
>>> ui_res);
>>>  
>>> -    ul = HIDE(1UL << (BITS_PER_LONG - 1) | 1);
>>> +    ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11);
>>>      for_each_set_bit ( i, ul )
>>>          ul_res |= 1UL << i;
>>>  
>>>      if ( ul != ul_res )
>>>          panic("for_each_set_bit(ulong) expected %#lx, got %#lx\n", ul, 
>>> ul_res);
>>>  
>>> -    ull = HIDE(0x8000000180000001ULL);
>>> +    ull = HIDE(0x8000000180000011ULL);
>>>      for_each_set_bit ( i, ull )
>>>          ull_res |= 1ULL << i;
>> How do these changes make a difference? Apart from ffs() using TZCNT, ...
>>
>>> @@ -127,6 +127,79 @@ static void __init test_for_each_set_bit(void)
>>>          panic("for_each_set_bit(break) expected 0x1008, got %#x\n", 
>>> ui_res);
>>>  }
>>>  
>>> +/*
>>> + * A type-generic fls() which picks the appropriate fls{,l,64}() based on 
>>> it's
>>> + * argument.
>>> + */
>>> +#define fls_g(x)                                        \
>>> +    (sizeof(x) <= sizeof(int)      ? fls(x) :           \
>>> +     sizeof(x) <= sizeof(long)     ? flsl(x) :          \
>>> +     sizeof(x) <= sizeof(uint64_t) ? fls64(x) :         \
>>> +     ({ BUILD_ERROR("fls_g() Bad input type"); 0; }))
>>> +
>>> +/*
>>> + * for_each_set_bit_reverse() - Iterate over all set bits in a scalar 
>>> value,
>>> + * from MSB to LSB.
>>> + *
>>> + * @iter An iterator name.  Scoped is within the loop only.
>>> + * @val  A scalar value to iterate over.
>>> + *
>>> + * A copy of @val is taken internally.
>>> + */
>>> +#define for_each_set_bit_reverse(iter, val)             \
>>> +    for ( typeof(val) __v = (val); __v; __v = 0 )       \
>>> +        for ( unsigned int (iter);                      \
>>> +              __v && ((iter) = fls_g(__v) - 1, true);   \
>>> +              __clear_bit(iter, &__v) )
>>> +
>>> +/*
>>> + * Xen doesn't have need of for_each_set_bit_reverse() at present, but the
>>> + * construct does exercise a case of arch_fls*() not covered anywhere else 
>>> by
>>> + * these tests.
>>> + */
>>> +static void __init test_for_each_set_bit_reverse(void)
>>> +{
>>> +    unsigned int  ui,  ui_res = 0, tmp;
>>> +    unsigned long ul,  ul_res = 0;
>>> +    uint64_t      ull, ull_res = 0;
>>> +
>>> +    ui = HIDE(0x80008001U);
>>> +    for_each_set_bit_reverse ( i, ui )
>>> +        ui_res |= 1U << i;
>>> +
>>> +    if ( ui != ui_res )
>>> +        panic("for_each_set_bit_reverse(uint) expected %#x, got %#x\n", 
>>> ui, ui_res);
>>> +
>>> +    ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11);
>>> +    for_each_set_bit_reverse ( i, ul )
>>> +        ul_res |= 1UL << i;
>>> +
>>> +    if ( ul != ul_res )
>>> +        panic("for_each_set_bit_reverse(ulong) expected %#lx, got %#lx\n", 
>>> ul, ul_res);
>>> +
>>> +    ull = HIDE(0x8000000180000011ULL);
>>> +    for_each_set_bit_reverse ( i, ull )
>>> +        ull_res |= 1ULL << i;
>> ... even here the need for the extra setting of bit 4 remains unclear to
>> me: The thing that was missing was the testing of the reverse for-each.
>> You mention the need for an asymmetric input in the description, but isn't
>> that covered already by the first test using 0x80008001U?
> 
> The first test covers {arch_,}f[fl]s() only.  It happens to be safe to
> count-from-the-wrong-end bugs, but that was by chance.
> 
> The second test covers {arch_,}f[fs]sl() only.  They are unsafe WRT
> symmetry, and disjoint (coverage wise) from the first test.
> 
> The third test, while the same as the second test on x86, isn't the same
> on arm32.
> 
> 
> Just because one test happens to be safe (symmetry wise) and passes,
> doesn't make the other variants tested.

Hmm, okay, it is of course in principle possible that one flavor is screwed
while the other isn't.

Acked-by: Jan Beulich <jbeulich@xxxxxxxx>

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.