[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/bitops: Optimise arch_ffs{,l}() some more on AMD
On 01.09.2025 16:21, Andrew Cooper wrote: > On 27/08/2025 8:52 am, Jan Beulich wrote: >> On 26.08.2025 19:41, Andrew Cooper wrote: >>> --- a/xen/common/bitops.c >>> +++ b/xen/common/bitops.c >>> @@ -97,14 +97,14 @@ static void __init test_for_each_set_bit(void) >>> if ( ui != ui_res ) >>> panic("for_each_set_bit(uint) expected %#x, got %#x\n", ui, >>> ui_res); >>> >>> - ul = HIDE(1UL << (BITS_PER_LONG - 1) | 1); >>> + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11); >>> for_each_set_bit ( i, ul ) >>> ul_res |= 1UL << i; >>> >>> if ( ul != ul_res ) >>> panic("for_each_set_bit(ulong) expected %#lx, got %#lx\n", ul, >>> ul_res); >>> >>> - ull = HIDE(0x8000000180000001ULL); >>> + ull = HIDE(0x8000000180000011ULL); >>> for_each_set_bit ( i, ull ) >>> ull_res |= 1ULL << i; >> How do these changes make a difference? Apart from ffs() using TZCNT, ... >> >>> @@ -127,6 +127,79 @@ static void __init test_for_each_set_bit(void) >>> panic("for_each_set_bit(break) expected 0x1008, got %#x\n", >>> ui_res); >>> } >>> >>> +/* >>> + * A type-generic fls() which picks the appropriate fls{,l,64}() based on >>> it's >>> + * argument. >>> + */ >>> +#define fls_g(x) \ >>> + (sizeof(x) <= sizeof(int) ? fls(x) : \ >>> + sizeof(x) <= sizeof(long) ? flsl(x) : \ >>> + sizeof(x) <= sizeof(uint64_t) ? fls64(x) : \ >>> + ({ BUILD_ERROR("fls_g() Bad input type"); 0; })) >>> + >>> +/* >>> + * for_each_set_bit_reverse() - Iterate over all set bits in a scalar >>> value, >>> + * from MSB to LSB. >>> + * >>> + * @iter An iterator name. Scoped is within the loop only. >>> + * @val A scalar value to iterate over. >>> + * >>> + * A copy of @val is taken internally. >>> + */ >>> +#define for_each_set_bit_reverse(iter, val) \ >>> + for ( typeof(val) __v = (val); __v; __v = 0 ) \ >>> + for ( unsigned int (iter); \ >>> + __v && ((iter) = fls_g(__v) - 1, true); \ >>> + __clear_bit(iter, &__v) ) >>> + >>> +/* >>> + * Xen doesn't have need of for_each_set_bit_reverse() at present, but the >>> + * construct does exercise a case of arch_fls*() not covered anywhere else >>> by >>> + * these tests. >>> + */ >>> +static void __init test_for_each_set_bit_reverse(void) >>> +{ >>> + unsigned int ui, ui_res = 0, tmp; >>> + unsigned long ul, ul_res = 0; >>> + uint64_t ull, ull_res = 0; >>> + >>> + ui = HIDE(0x80008001U); >>> + for_each_set_bit_reverse ( i, ui ) >>> + ui_res |= 1U << i; >>> + >>> + if ( ui != ui_res ) >>> + panic("for_each_set_bit_reverse(uint) expected %#x, got %#x\n", >>> ui, ui_res); >>> + >>> + ul = HIDE(1UL << (BITS_PER_LONG - 1) | 0x11); >>> + for_each_set_bit_reverse ( i, ul ) >>> + ul_res |= 1UL << i; >>> + >>> + if ( ul != ul_res ) >>> + panic("for_each_set_bit_reverse(ulong) expected %#lx, got %#lx\n", >>> ul, ul_res); >>> + >>> + ull = HIDE(0x8000000180000011ULL); >>> + for_each_set_bit_reverse ( i, ull ) >>> + ull_res |= 1ULL << i; >> ... even here the need for the extra setting of bit 4 remains unclear to >> me: The thing that was missing was the testing of the reverse for-each. >> You mention the need for an asymmetric input in the description, but isn't >> that covered already by the first test using 0x80008001U? > > The first test covers {arch_,}f[fl]s() only. It happens to be safe to > count-from-the-wrong-end bugs, but that was by chance. > > The second test covers {arch_,}f[fs]sl() only. They are unsafe WRT > symmetry, and disjoint (coverage wise) from the first test. > > The third test, while the same as the second test on x86, isn't the same > on arm32. > > > Just because one test happens to be safe (symmetry wise) and passes, > doesn't make the other variants tested. Hmm, okay, it is of course in principle possible that one flavor is screwed while the other isn't. Acked-by: Jan Beulich <jbeulich@xxxxxxxx> Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |