Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/xen: Clear user %gs before updating segment descriptors

To: "Woodhouse, David" <dwmw@xxxxxxxxxxxx>

From: Andy Lutomirski <luto@xxxxxxxxxx>

Date: Fri, 7 Dec 2018 15:27:23 -0800

Cc: Juergen Gross <jgross@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, X86 ML <x86@xxxxxxxxxx>, Andrew Lutomirski <luto@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, David Woodhouse <dwmw2@xxxxxxxxxxxxx>

Delivery-date: Fri, 07 Dec 2018 23:27:46 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Dec 7, 2018 at 3:15 PM David Woodhouse <dwmw@xxxxxxxxxxxx> wrote: > > During a context switch, if clearing a descriptor which is currently > referenced by the old process's user %gs, if Xen preempts the vCPU > before %gs is set for the new process, a fault may occur. > > This fault actually seems to be fairly harmless; xen_failsafe_callback > will just return to the "faulting" instruction (the random one after the > vCPU was preempted) with the offending segment register zeroed, and then > it'll get set again later during the context switch. But it's cleaner > to avoid it. > > If the descriptor referenced by the %gs selector is being touched, > then include a request to zero the user %gs in the multicall too. Fine with me. > > Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> > --- > v2: Don't accidentally remove the call to xen_mc_batch(). > > arch/x86/include/asm/xen/hypercall.h | 11 ++++++++ > arch/x86/xen/enlighten_pv.c | 40 ++++++++++++++++++++++------ > 2 files changed, 43 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/include/asm/xen/hypercall.h > b/arch/x86/include/asm/xen/hypercall.h > index ef05bea7010d..e8b383b24246 100644 > --- a/arch/x86/include/asm/xen/hypercall.h > +++ b/arch/x86/include/asm/xen/hypercall.h > @@ -520,4 +520,15 @@ MULTI_stack_switch(struct multicall_entry *mcl, > trace_xen_mc_entry(mcl, 2); > } > > +static inline void > +MULTI_set_segment_base(struct multicall_entry *mcl, > + int reg, unsigned long value) > +{ > + mcl->op = __HYPERVISOR_set_segment_base; > + mcl->args[0] = reg; > + mcl->args[1] = value; > + > + trace_xen_mc_entry(mcl, 2); > +} > + > #endif /* _ASM_X86_XEN_HYPERCALL_H */ > diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c > index 2f6787fc7106..2eb9827dab4b 100644 > --- a/arch/x86/xen/enlighten_pv.c > +++ b/arch/x86/xen/enlighten_pv.c > @@ -506,7 +506,7 @@ static inline bool desc_equal(const struct desc_struct > *d1, > } > > static void load_TLS_descriptor(struct thread_struct *t, > - unsigned int cpu, unsigned int i) > + unsigned int cpu, unsigned int i, int > flush_gs) > { > struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i]; > struct desc_struct *gdt; > @@ -516,6 +516,17 @@ static void load_TLS_descriptor(struct thread_struct *t, > if (desc_equal(shadow, &t->tls_array[i])) > return; > > + /* > + * If the current user %gs points to a descriptor we're changing, > + * zero it first to avoid taking a fault if Xen preempts this > + * vCPU between now and the time that %gs is later loaded with > + * the new value. > + */ > + if ((flush_gs >> 3) == GDT_ENTRY_TLS_MIN + i) { > + mc = __xen_mc_entry(0); > + MULTI_set_segment_base(mc.mc, SEGBASE_GS_USER_SEL, 0); > + } > + > *shadow = t->tls_array[i]; > > gdt = get_cpu_gdt_rw(cpu); > @@ -527,6 +538,8 @@ static void load_TLS_descriptor(struct thread_struct *t, > > static void xen_load_tls(struct thread_struct *t, unsigned int cpu) > { > + u16 flush_gs = 0; > + > /* > * XXX sleazy hack: If we're being called in a lazy-cpu zone > * and lazy gs handling is enabled, it means we're in a > @@ -537,27 +550,38 @@ static void xen_load_tls(struct thread_struct *t, > unsigned int cpu) > * This will go away as soon as Xen has been modified to not > * save/restore %gs for normal hypercalls. > * > - * On x86_64, this hack is not used for %gs, because gs points > - * to KERNEL_GS_BASE (and uses it for PDA references), so we > - * must not zero %gs on x86_64 > - * > * For x86_64, we need to zero %fs, otherwise we may get an > * exception between the new %fs descriptor being loaded and > * %fs being effectively cleared at __switch_to(). > + * > + * We may also need to zero %gs, if it refers to a descriptor > + * which we are clearing. Otherwise a Xen vCPU context switch > + * before it gets reloaded could also cause a fault. Since > + * clearing the user %gs is another hypercall, do that only if > + * it's necessary. > + * > + * Note: These "faults" end up in xen_failsafe_callback(), > + * which just returns immediately to the "faulting" instruction > + * (i.e. the random one after Xen preempted this vCPU) with > + * the offending segment register zeroed. Which is actually > + * a perfectly safe thing to happen anyway, as it'll be loaded > + * again shortly. So maybe we needn't bother? > */ > if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) { > #ifdef CONFIG_X86_32 > lazy_load_gs(0); > #else > + savesegment(gs, flush_gs); > + > loadsegment(fs, 0); > #endif > } > > xen_mc_batch(); > > - load_TLS_descriptor(t, cpu, 0); > - load_TLS_descriptor(t, cpu, 1); > - load_TLS_descriptor(t, cpu, 2); > + load_TLS_descriptor(t, cpu, 0, flush_gs); > + load_TLS_descriptor(t, cpu, 1, flush_gs); > + load_TLS_descriptor(t, cpu, 2, flush_gs); > > xen_mc_issue(PARAVIRT_LAZY_CPU); > } > -- > 2.17.1 > > > > > Amazon Development Centre (London) Ltd. Registered in England and Wales with > registration number 04543232 with its registered office at 1 Principal Place, > Worship Street, London EC2A 2FA, United Kingdom. > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.