[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 3/7] x86: re-work memcpy()
On 26.11.2024 20:16, Andrew Cooper wrote: > On 25/11/2024 2:28 pm, Jan Beulich wrote: >> Move the function to its own assembly file. Having it in C just for the >> entire body to be an asm() isn't really helpful. Then have two flavors: >> A "basic" version using qword steps for the bulk of the operation, and an >> ERMS version for modern hardware, to be substituted in via alternatives >> patching. >> >> Alternatives patching, however, requires an extra precaution: It uses >> memcpy() itself, and hence the function may patch itself. Luckily the >> patched-in code only replaces the prolog of the original function. Make >> sure this remains this way. >> >> Additionally alternatives patching, while supposedly safe via enforcing >> a control flow change when modifying already prefetched code, may not >> really be. Afaict a request is pending to drop the first of the two >> options in the SDM's "Handling Self- and Cross-Modifying Code" section. >> Insert a serializing instruction there. >> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> We may want to consider branching over the REP MOVSQ as well, if the >> number of qwords turns out to be zero. >> We may also want to consider using non-REP MOVS{L,W,B} for the tail. > > My feedback for patch 2 is largely applicable here too. Sure, and I'll apply here whatever we decide to do there. >> --- a/xen/arch/x86/alternative.c >> +++ b/xen/arch/x86/alternative.c >> @@ -153,12 +153,14 @@ void init_or_livepatch add_nops(void *in >> * executing. >> * >> * "noinline" to cause control flow change and thus invalidate I$ and >> - * cause refetch after modification. >> + * cause refetch after modification. While the SDM continues to suggest >> this >> + * is sufficient, it may not be - issue a serializing insn afterwards as >> well. > > Did you find a problem in practice, or is this just in case? It's been too long, so I can now only guess that it's just in case. The comment change, otoh, suggests otherwise. > I suspect if you are seeing problems, then it's non-atomicity of the > stores into memcpy() rather than serialisation. How would atomicity (or not) matter here? There shouldn't be any difference between a single and any number of stores into the (previously executed) insn stream. >> */ >> static void init_or_livepatch noinline >> text_poke(void *addr, const void *opcode, size_t len) >> { >> memcpy(addr, opcode, len); >> + cpuid_eax(0); > > This whole function is buggy in a couple of ways, starting with the > comments. > > The comment about noinline and control flow changes is only really > relevant to 32bit processors; we inherited that comment from Linux, and > they're not applicable to Xen. > > AMD64 (both the APM, and SDM) guarantee that Self Modifying Code will be > dealt with on your behalf, with no serialisation needed. > > Cross-modifying code needs far more severe serialisation than given > here. We get away with it because alternative_{instructions,branches}() > are pre-SMP, and apply_alternatives() is on livepatches prior to them > becoming live. > > > I happen to know there's an AMD CPU which has an erratum regarding Self > Modifying Code and genuinely does need a serialising instruction, but I > don't know which exact CPU it is. Maybe I ran into that on one of the two older AMD systems I routinely test on every once in a while? > If we're going to put a serialising instruction, it should be a write to > CR2. We don't care about 486 compatibility, and it's faster than CPUID > and much much faster if virtualised because it's unlikely to be > intercepted even under shadow paging. > > But, it would be nice not to put serialisation in the general case to > begin with, especially not into the livepatching case. If you're aware of an erratum there, how can we get away without any serialization? I can surely switch to a CR2 write, and I can also make this dependent upon system_state (thus excluding the LP case). I notice that arch_livepatch_{apply,revert}() indeed use plain memcpy() with just the noinline "protection". I wonder how well that works if a livepatch actually touched the tail of either of these functions (however unlikely that may be). Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |