Xen project Mailing List

Re: [PATCH v4 4/4] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 19 Dec 2023 12:48:00 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>

Delivery-date: Tue, 19 Dec 2023 11:48:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Dec 19, 2023 at 10:56:16AM +0100, Jan Beulich wrote: > On 18.12.2023 18:24, Roger Pau Monné wrote: > > On Tue, Feb 14, 2023 at 05:12:08PM +0100, Jan Beulich wrote: > >> Since both kernel and user mode run in ring 3, they run in the same > >> "predictor mode". > > > > That only true when IBRS is enabled, otherwise all CPU modes share the > > same predictor mode? > > But here we only care about ring 3 anyway? Hm, yes, what I wanted to say is that this is not exclusive to 64bit PV, as with IBRS disabled ring 3 and ring 0 share the same predictor mode. Anyway, not relevant. > > >> @@ -753,7 +755,9 @@ static inline void pv_inject_sw_interrup > >> * but we can't make such requests fail all of the sudden. > >> */ > >> #define PV64_VM_ASSIST_MASK (PV32_VM_ASSIST_MASK | \ > >> - (1UL << VMASST_TYPE_m2p_strict)) > >> + (1UL << VMASST_TYPE_m2p_strict) | \ > >> + ((opt_ibpb_mode_switch + 0UL) << \ > >> + VMASST_TYPE_mode_switch_no_ibpb)) > > > > I'm wondering that it's kind of weird to offer the option to PV domUs > > if opt_ibpb_entry_pv is set, as then the guest mode switch will always > > (implicitly) do a IBPB as requiring an hypercall and thus take an > > entry point into Xen. > > > > I guess it's worth having it just as a way to signal to Xen that the > > hypervisor does perform an IBPB, even if the guest cannot disable it. > > I'm afraid I'm confused by your reply. Not only, but also because the > latter sentence looks partly backwards / non-logical to me. Sorry, I think I didn't word that very well. The remark is tied to the one below about the vmassist 'possibly' allowing the guest to disable IBPB on guest user -> kernel context switches, but Xen might unconditionally do additional IBPBs that the guest cannot disable. > >> --- a/xen/arch/x86/pv/domain.c > >> +++ b/xen/arch/x86/pv/domain.c > >> @@ -455,6 +455,7 @@ static void _toggle_guest_pt(struct vcpu > >> void toggle_guest_mode(struct vcpu *v) > >> { > >> const struct domain *d = v->domain; > >> + struct cpu_info *cpu_info = get_cpu_info(); > >> unsigned long gs_base; > >> > >> ASSERT(!is_pv_32bit_vcpu(v)); > >> @@ -467,15 +468,21 @@ void toggle_guest_mode(struct vcpu *v) > >> if ( v->arch.flags & TF_kernel_mode ) > >> v->arch.pv.gs_base_kernel = gs_base; > >> else > >> + { > >> v->arch.pv.gs_base_user = gs_base; > >> + > >> + if ( opt_ibpb_mode_switch && > >> + !(d->arch.spec_ctrl_flags & SCF_entry_ibpb) && > >> + !VM_ASSIST(d, mode_switch_no_ibpb) ) > >> + cpu_info->spec_ctrl_flags |= SCF_new_pred_ctxt; > > > > Likewise similar to the remarks I've made before, if doing an IBPB on > > entry is enough to cover for the case here, it must also be fine to > > issue the IBPB right here, instead of deferring to return to guest > > context? > > > > The only concern would be (as you mentioned before) to avoid clearing > > valid Xen predictions, but I would rather see some figures about what > > effect the delaying to return to guest has vs issuing it right here. > > Part of the reason (aiui) to do things on the exit path was to > consolidate the context switch induced one and the user->kernel switch > one into the same place and mechanism. Isn't it kind of a very specific case that we end up doing a user->kernel switch as part of a context switch? IOW: would require the vCPU to be scheduled out at that very specific point. > >> + * > >> + * By default (on affected and capable hardware) as a safety measure Xen, > >> + * to cover for the fact that guest-kernel and guest-user modes are both > >> + * running in ring 3 (and hence share prediction context), would issue a > >> + * barrier for user->kernel mode switches of PV guests. > >> + */ > >> +#define VMASST_TYPE_mode_switch_no_ibpb 33 > > > > Would it be possible to define the assist as > > VMASST_TYPE_mode_switch_ibpb and have it on when enabled? So that the > > guest would disable it if unneeded? IMO negated options are in > > general harder to understand. > > Negative options aren't nice, yes, but VM assists start out as all > clear. Are you sure? I see VMASST_TYPE_pae_extended_cr3 getting set in dom0_construct_pv() and that makes me wonder whether other bits couldn't start set also. Maybe there's some restriction I'm missing, but I don't see any wording in the description of the interface that states that all assists are supposed to start disabled. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.