[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.21 1/2] x86/AMD: avoid REP MOVSB for Zen3/4
Le 25/09/2025 à 15:02, Jan Beulich a écrit : > On 25.09.2025 14:18, Teddy Astie wrote: >> Le 25/09/2025 à 12:48, Jan Beulich a écrit : >>> Along with Zen2 (which doesn't expose ERMS), both families reportedly >>> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB >>> can actually be carried out the accelerated way. Therefore we want to >>> avoid its use in the common case (memset(), copy_page_hot()). >> >> s/memset/memcpy (memset probably uses rep stosb which is not affected IIUC) > > Oops, yes. > >>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>> --- >>> Question is whether merely avoiding REP MOVSB (but not REP MOVSQ) is going >>> to be good enough. >> >> This probably wants to be checked with benchmarks of rep movsb vs rep >> movsq+b (current non-ERMS algorithm). If the issue also occurs with rep >> movsq, it may be preferable to keep rep movsb even considering this issue. > > Why? Then REP MOVSB is 8 times slower than REP MOVSQ. > It doesn't match my observations while quickly benching rep movsb vs rep movsq+b (fallback) with varying alignments/sizes on Zen3/4 (Ryzen and EPYC). It's very sensitive to size and aligment, but in many (but not all) cases, rep movsb is significantly faster than rep movsq+b. The worst cases (mentioned bug) are much slower in both cases, though rep movsq+b tend to perform better in these cases. So unfortunately it's not as simple as rep movsb being (almost) always slower, especially with the varied copy sizes and aligments that does grant_copy. That's what I would prefer having more data to have a better picture. >>> --- a/xen/arch/x86/cpu/amd.c >>> +++ b/xen/arch/x86/cpu/amd.c >>> @@ -1386,6 +1386,10 @@ static void cf_check init_amd(struct cpu >>> >>> check_syscfg_dram_mod_en(); >>> >>> + if (c == &boot_cpu_data && cpu_has(c, X86_FEATURE_ERMS) >>> + && c->family != 0x19 /* Zen3/4 */) >>> + setup_force_cpu_cap(X86_FEATURE_XEN_REP_MOVSB); >>> + >> >> May it be fixed through a (future ?) microcode update, especially since >> rep movs is microcoded on these archs ? > > I don't know, but I also don't expect that to happen. > > Jan > Teddy -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |