[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 3/5] xen/sort: Switch to an extern inline implementation
On 11.11.2021 18:57, Andrew Cooper wrote: > There are exactly 3 callers of sort() in the hypervisor. > > Both arm callers pass in NULL for the swap function. While this might seem > like an attractive option at first, it causes generic_swap() to be used which > forced a byte-wise copy. Provide real swap functions which the compiler can > optimise sensibly. > > Furthermore, use of function pointers in tight loops like that can be very bad > for performance. Implement sort() as extern inline, so the optimiser can > judge whether to inline things or not. > > On x86, the diffstat shows how much of a better job the compiler can do when > it is able to see the cmp/swap implementations. > > $ ../scripts/bloat-o-meter xen-syms-before xen-syms-after > add/remove: 0/5 grow/shrink: 1/1 up/down: 505/-735 (-230) > Function old new delta > sort_exception_table 31 536 +505 > u32_swap 9 - -9 > generic_swap 34 - -34 > cmp_ex 36 - -36 > swap_ex 41 - -41 > sort_exception_tables 90 38 -52 > sort 563 - -563 > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Technically Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> Yet again without the intention of overriding Julien's concerns in any way. To address one of them, how about retaining generic_swap() (as an inline function), ... > --- a/xen/include/xen/sort.h > +++ b/xen/include/xen/sort.h > @@ -3,8 +3,61 @@ > > #include <xen/types.h> > > +/* > + * sort - sort an array of elements > + * @base: pointer to data to sort > + * @num: number of elements > + * @size: size of each element > + * @cmp: pointer to comparison function > + * @swap: pointer to swap function or NULL > + * > + * This function does a heapsort on the given array. You may provide a > + * swap function optimized to your element type. > + * > + * Sorting time is O(n log n) both on average and worst-case. While > + * qsort is about 20% faster on average, it suffers from exploitable > + * O(n*n) worst-case behavior and extra memory requirements that make > + * it less suitable for kernel use. > + */ > +#ifndef SORT_IMPLEMENTATION > +extern gnu_inline > +#endif > void sort(void *base, size_t num, size_t size, > int (*cmp)(const void *, const void *), > - void (*swap)(void *, void *, size_t)); > + void (*swap)(void *, void *, size_t)) > +{ > + /* pre-scale counters for performance */ > + size_t i = (num / 2) * size, n = num * size, c, r; > + > + /* heapify */ > + while ( i > 0 ) > + { > + for ( r = i -= size; r * 2 + size < n; r = c ) > + { > + c = r * 2 + size; > + if ( (c < n - size) && (cmp(base + c, base + c + size) < 0) ) > + c += size; > + if ( cmp(base + r, base + c) >= 0 ) > + break; > + swap(base + r, base + c, size); ... doing if ( swap ) swap(base + r, base + c, size); else generic_swap(base + r, base + c, size); here and below. The compiler would then still be able to eliminate the indirect calls (as well as the added conditional), I think. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |