[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.21 1/2] x86/AMD: avoid REP MOVSB for Zen3/4
On 30.09.2025 15:03, Teddy Astie wrote: > Le 25/09/2025 à 15:02, Jan Beulich a écrit : >> On 25.09.2025 14:18, Teddy Astie wrote: >>> Le 25/09/2025 à 12:48, Jan Beulich a écrit : >>>> Along with Zen2 (which doesn't expose ERMS), both families reportedly >>>> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB >>>> can actually be carried out the accelerated way. Therefore we want to >>>> avoid its use in the common case (memset(), copy_page_hot()). >>> >>> s/memset/memcpy (memset probably uses rep stosb which is not affected IIUC) >> >> Oops, yes. >> >>>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>>> --- >>>> Question is whether merely avoiding REP MOVSB (but not REP MOVSQ) is going >>>> to be good enough. >>> >>> This probably wants to be checked with benchmarks of rep movsb vs rep >>> movsq+b (current non-ERMS algorithm). If the issue also occurs with rep >>> movsq, it may be preferable to keep rep movsb even considering this issue. >> >> Why? Then REP MOVSB is 8 times slower than REP MOVSQ. >> > > It doesn't match my observations while quickly benching rep movsb vs rep > movsq+b (fallback) with varying alignments/sizes on Zen3/4 (Ryzen and EPYC). > > It's very sensitive to size and aligment, but in many (but not all) > cases, rep movsb is significantly faster than rep movsq+b. The worst > cases (mentioned bug) are much slower in both cases, though rep movsq+b > tend to perform better in these cases. Which is what the patch here is trying to address. > So unfortunately it's not as simple as rep movsb being (almost) always > slower, especially with the varied copy sizes and aligments that does > grant_copy. That's what I would prefer having more data to have a better > picture. Well, what I would have preferred is some actual written down description of the aliasing issue. I'm unaware of such; the patch is solely based on what Andrew has been telling me verbally (piecemeal). I've tried to reflect this in how the description is written. What you suggest would, aiui, entail more complicated decision logic in the memcpy() implementation, which (at least for now) we'd like to avoid. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |