[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/livepatch: Fix livepatch application when CET is active
On 17/04/2023 2:59 pm, Jan Beulich wrote: > On 17.04.2023 15:52, Andrew Cooper wrote: >> Right now, trying to apply a livepatch on any system with CET shstk (AMD Zen3 >> or later, Intel Tiger Lake or Sapphire Rapids and later) fails as follows: >> >> (XEN) livepatch: lp: Verifying enabled expectations for all functions >> (XEN) common/livepatch.c:1591: livepatch: lp: timeout is 30000000ns >> (XEN) common/livepatch.c:1703: livepatch: lp: CPU28 - IPIing the other 127 >> CPUs >> (XEN) livepatch: lp: Applying 1 functions >> (XEN) hi_func: Hi! (called 1 times) >> (XEN) Hook executing. >> (XEN) Assertion 'local_irq_is_enabled() || cpumask_subset(mask, >> cpumask_of(cpu))' failed at arch/x86/smp.c:265 >> (XEN) *** DOUBLE FAULT *** >> <many double faults> >> >> The assertion failure is from a global (system wide) TLB flush initiated by >> modify_xen_mappings(). I'm not entirely sure when this broke, and I'm not >> sure exactly what causes the #DF's, but it doesn't really matter either >> because they highlight a latent bug that I'd overlooked with the CET-SS vs >> patching work the first place. >> >> While we're careful to arrange for the patching CPU to avoid encountering >> non-shstk memory with transient shstk perms, other CPUs can pick these >> mappings up too if they need to re-walk for uarch reasons. >> >> Another bug is that for livepatching, we only disable CET if shadow stacks >> are >> in use. Running on Intel CET systems when Xen is only using CET-IBT will >> crash in arch_livepatch_quiesce() when trying to clear CR0.WP with CR4.CET >> still active. >> >> Also, we never went and cleared the dirty bits on .rodata. This would >> matter (for the same reason it matters on .text - it becomes a valid target >> for WRSS), but we never actually patch .rodata anyway. >> >> Therefore rework how we do patching for both alternatives and livepatches. >> >> Introduce modify_xen_mappings_lite() with a purpose similar to >> modify_xen_mappings(), but stripped down to the bare minimum as it's used in >> weird contexts. Leave all complexity to the caller to handle. >> >> Instead of patching by clearing CR0.WP (and having to jump through some >> fragile hoops to disable CET in order to do this), just transiently relax the >> permissions on .text via l2_identmap[]. >> >> Note that neither alternatives nor livepatching edit .rodata, so we don't >> need >> to relax those permissions at this juncture. >> >> The perms are relaxed globally, but is safe enough. Alternatives run before >> we boot APs, and Livepatching runs in a quiesced state where the other CPUs >> are not doing anything interesting. >> >> This approach is far more robust. >> >> Fixes: 48cdc15a424f ("x86/alternatives: Clear CR4.CET when clearing CR0.WP") >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> Thanks. > > One further remark, though: > >> @@ -5879,6 +5880,73 @@ int destroy_xen_mappings(unsigned long s, unsigned >> long e) >> return modify_xen_mappings(s, e, _PAGE_NONE); >> } >> >> +/* >> + * Similar to modify_xen_mappings(), but used by the alternatives and >> + * livepatch in weird contexts. All synchronization, TLB flushing, etc is >> the >> + * responsibility of the caller, and *MUST* not be introduced here. >> + * >> + * Must be limited to XEN_VIRT_{START,END}, i.e. over l2_xenmap[]. >> + * Must be called with present flags, and over present mappings. >> + * Must be called on leaf page boundaries. > This last sentence, while wording-wise correct, could do with making more > explicit that it is the caller's responsibility to know whether large page > mappings are in use, due to ... The meaning here is really "this doesn't shatter superpages", and this was the most concise I could come up with. Would ", i.e. won't shatter 2M pages." as a clarification work? ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |