|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/bitops: Optimise arch_ffs()/etc some more on AMD
On 28.05.2025 00:29, Andrew Cooper wrote:
> One aspect of the old ffs()/etc infrastructure was the use of TZCNT/LZCNT
> without a CPUID check. Given no obvious justification, I dropped it during
> the cleanup.
>
> It turns out there is a good reason, on all but the most recent AMD CPUs.
>
> Reinstate the use of "blind" TZCNT/LZCNT when safe to do so. This happens to
> be preferentially in loops where a repeated saving of 5-6 uops becomes far
> more relevant than a one-off scenario.
>
> Leave an explanation of why.
>
> No functional change.
>
> Fixes: 989e5f08d33e ("x86/bitops: Improve arch_ffs() in the general case")
> Fixes: 5ed26fc0768d ("xen/bitops: Implement ffsl() in common logic")
> Fixes: 54b10ef6c810 ("xen/bitops: Implement fls()/flsl() in common logic")
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
In principle
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
but ...
> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -399,9 +399,16 @@ static always_inline unsigned int arch_ffs(unsigned int
> x)
> *
> * and the optimiser really can work with the knowledge of x being
> * non-zero without knowing it's exact value, in which case we don't
> - * need to compensate for BSF's corner cases. Otherwise...
> + * need to compensate for BSF's corner cases.
> + *
> + * That said, we intentionally use TZCNT on capable hardware when the
> + * behaviour for the 0 case doesn't matter. On AMD CPUs prior to
> + * Zen4, TZCNT is 1-2 uops while BSF is 6-8 with a latency to match.
> + * Intel CPUs don't suffer this discrepancy.
> + *
> + * Otherwise...
> */
> - asm ( "bsf %[val], %[res]"
> + asm ( "rep bsf %[val], %[res]"
... why would we use REP (again) when gas 2.25 supports LZCNT/TZCNT?
Jan
> : [res] "=r" (r)
> : [val] "rm" (x) );
> }
> @@ -428,7 +435,7 @@ static always_inline unsigned int arch_ffsl(unsigned long
> x)
>
> /* See arch_ffs() for safety discussions. */
> if ( __builtin_constant_p(x > 0) && x > 0 )
> - asm ( "bsf %[val], %q[res]"
> + asm ( "rep bsf %[val], %q[res]"
> : [res] "=r" (r)
> : [val] "rm" (x) );
> else
> @@ -446,7 +453,7 @@ static always_inline unsigned int arch_fls(unsigned int x)
>
> /* See arch_ffs() for safety discussions. */
> if ( __builtin_constant_p(x > 0) && x > 0 )
> - asm ( "bsr %[val], %[res]"
> + asm ( "rep bsr %[val], %[res]"
> : [res] "=r" (r)
> : [val] "rm" (x) );
> else
> @@ -464,7 +471,7 @@ static always_inline unsigned int arch_flsl(unsigned long
> x)
>
> /* See arch_ffs() for safety discussions. */
> if ( __builtin_constant_p(x > 0) && x > 0 )
> - asm ( "bsr %[val], %q[res]"
> + asm ( "rep bsr %[val], %q[res]"
> : [res] "=r" (r)
> : [val] "rm" (x) );
> else
>
> base-commit: d965e2ee07c56c341d8896852550914d87ea5374
> prerequisite-patch-id: 8da89000c73c38aab6abfa6622217ea9eff07fbd
> prerequisite-patch-id: 74830838bac94ed1e036a8173cf3210a314b35d8
> prerequisite-patch-id: 0654835c28df8d40b6c97006d041c4d31447a9a6
> prerequisite-patch-id: 2d47d646c6a6e0019918c57753d6c01a752c377f
> prerequisite-patch-id: d8f8c4221a2d7039bae6f3d38e93fe90b2091d01
> prerequisite-patch-id: e0397c86b545a1d65f2e6b2049c282b926c40c64
> prerequisite-patch-id: ea21abe4540ee229f4f8725ce3f701d9ba4bd4a8
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |