|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 3/5] xen/sort: Switch to an extern inline implementation
On 11.11.2021 18:57, Andrew Cooper wrote:
> There are exactly 3 callers of sort() in the hypervisor.
>
> Both arm callers pass in NULL for the swap function. While this might seem
> like an attractive option at first, it causes generic_swap() to be used which
> forced a byte-wise copy. Provide real swap functions which the compiler can
> optimise sensibly.
>
> Furthermore, use of function pointers in tight loops like that can be very bad
> for performance. Implement sort() as extern inline, so the optimiser can
> judge whether to inline things or not.
>
> On x86, the diffstat shows how much of a better job the compiler can do when
> it is able to see the cmp/swap implementations.
>
> $ ../scripts/bloat-o-meter xen-syms-before xen-syms-after
> add/remove: 0/5 grow/shrink: 1/1 up/down: 505/-735 (-230)
> Function old new delta
> sort_exception_table 31 536 +505
> u32_swap 9 - -9
> generic_swap 34 - -34
> cmp_ex 36 - -36
> swap_ex 41 - -41
> sort_exception_tables 90 38 -52
> sort 563 - -563
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Technically
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
Yet again without the intention of overriding Julien's concerns in any
way. To address one of them, how about retaining generic_swap() (as an
inline function), ...
> --- a/xen/include/xen/sort.h
> +++ b/xen/include/xen/sort.h
> @@ -3,8 +3,61 @@
>
> #include <xen/types.h>
>
> +/*
> + * sort - sort an array of elements
> + * @base: pointer to data to sort
> + * @num: number of elements
> + * @size: size of each element
> + * @cmp: pointer to comparison function
> + * @swap: pointer to swap function or NULL
> + *
> + * This function does a heapsort on the given array. You may provide a
> + * swap function optimized to your element type.
> + *
> + * Sorting time is O(n log n) both on average and worst-case. While
> + * qsort is about 20% faster on average, it suffers from exploitable
> + * O(n*n) worst-case behavior and extra memory requirements that make
> + * it less suitable for kernel use.
> + */
> +#ifndef SORT_IMPLEMENTATION
> +extern gnu_inline
> +#endif
> void sort(void *base, size_t num, size_t size,
> int (*cmp)(const void *, const void *),
> - void (*swap)(void *, void *, size_t));
> + void (*swap)(void *, void *, size_t))
> +{
> + /* pre-scale counters for performance */
> + size_t i = (num / 2) * size, n = num * size, c, r;
> +
> + /* heapify */
> + while ( i > 0 )
> + {
> + for ( r = i -= size; r * 2 + size < n; r = c )
> + {
> + c = r * 2 + size;
> + if ( (c < n - size) && (cmp(base + c, base + c + size) < 0) )
> + c += size;
> + if ( cmp(base + r, base + c) >= 0 )
> + break;
> + swap(base + r, base + c, size);
... doing
if ( swap )
swap(base + r, base + c, size);
else
generic_swap(base + r, base + c, size);
here and below. The compiler would then still be able to eliminate the
indirect calls (as well as the added conditional), I think.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |