|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 2/7] x86: re-work memset()
On 26.11.2024 18:13, Andrew Cooper wrote:
> On 25/11/2024 2:28 pm, Jan Beulich wrote:
>> ---
>> We may want to consider branching over the REP STOSQ as well, if the
>> number of qwords turns out to be zero.
>
> Until FSR{S,M} (Fast Short Rep {STO,MOV}SB), which is far newer than
> ERMS, passing 0 into any REP instruction is expensive.
Is this a request to add such a conditional branch then, perhaps patched
out when FSRS is available? And then perhaps also for the JZ that's
already there?
> I wonder how often we memset with a size less than 8.
Hence why I raised the point, rather than putting the jump there directly.
>> We may also want to consider using non-REP STOS{L,W,B} for the tail.
>
> Probably, yes. We use this form in non-ERMS cases, where we're advised
> to stay away from STOSB entirely.
Yet then we'll end up with three conditional branches - do we really want
that?
> Interestingly, Linux doesn't have a STOSQ case at all. Or rather, it
> was deleted by Linus in 20f3337d350c last year. It was also identified
> as causing a performance regression.
> https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@xxxxxxxxxxxxxx/T/#u
> although the memset() path was not reverted as part of the fix
> (47ee3f1dd93bcb eventually).
>
> Yet ca96b162bfd2 shows that REP MOVSQ is still definitely a win on Rome
> CPUs.
>
> I expect we probably do want some non-rep forms in here.
>
> Do you have any benchmarks with this series?
What I specifically measured were the clear_page() variants. I didn't do
any measurements for memset() (or memcpy()), first ad foremost because
any selection of inputs is going to be arbitrary rather than representative.
>> --- /dev/null
>> +++ b/xen/arch/x86/memset.S
>> @@ -0,0 +1,30 @@
>> +#include <asm/asm_defns.h>
>> +
>> +.macro memset
>> + and $7, %edx
>> + shr $3, %rcx
>> + movzbl %sil, %esi
>> + mov $0x0101010101010101, %rax
>> + imul %rsi, %rax
>> + mov %rdi, %rsi
>> + rep stosq
>> + or %edx, %ecx
>> + jz 0f
>> + rep stosb
>> +0:
>> + mov %rsi, %rax
>
> Could you use %r8/9/etc instead of %rsi please? This is deceptively
> close to looking like a bug, and it took me a while to figure out it's
> only correct because STOSB only edits %rdi.
Well, I can certainly switch (the number of REX prefixes will remain the
same as it looks), but the fact that STOS, unlike MOVS, doesn't touch
%rsi is a pretty basic one.
Does your request extend to all uses of %rsi (and %esi), or merely the
latter two (across the REP STOS)?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |