[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 4/4] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode



On Tue, Dec 19, 2023 at 10:56:16AM +0100, Jan Beulich wrote:
> On 18.12.2023 18:24, Roger Pau Monné wrote:
> > On Tue, Feb 14, 2023 at 05:12:08PM +0100, Jan Beulich wrote:
> >> Since both kernel and user mode run in ring 3, they run in the same
> >> "predictor mode".
> > 
> > That only true when IBRS is enabled, otherwise all CPU modes share the
> > same predictor mode?
> 
> But here we only care about ring 3 anyway?

Hm, yes, what I wanted to say is that this is not exclusive to 64bit
PV, as with IBRS disabled ring 3 and ring 0 share the same predictor
mode.  Anyway, not relevant.

> 
> >> @@ -753,7 +755,9 @@ static inline void pv_inject_sw_interrup
> >>   * but we can't make such requests fail all of the sudden.
> >>   */
> >>  #define PV64_VM_ASSIST_MASK (PV32_VM_ASSIST_MASK                      | \
> >> -                             (1UL << VMASST_TYPE_m2p_strict))
> >> +                             (1UL << VMASST_TYPE_m2p_strict)          | \
> >> +                             ((opt_ibpb_mode_switch + 0UL) <<           \
> >> +                              VMASST_TYPE_mode_switch_no_ibpb))
> > 
> > I'm wondering that it's kind of weird to offer the option to PV domUs
> > if opt_ibpb_entry_pv is set, as then the guest mode switch will always
> > (implicitly) do a IBPB as requiring an hypercall and thus take an
> > entry point into Xen.
> > 
> > I guess it's worth having it just as a way to signal to Xen that the
> > hypervisor does perform an IBPB, even if the guest cannot disable it.
> 
> I'm afraid I'm confused by your reply. Not only, but also because the
> latter sentence looks partly backwards / non-logical to me.

Sorry, I think I didn't word that very well.  The remark is tied to
the one below about the vmassist 'possibly' allowing the guest to
disable IBPB on guest user -> kernel context switches, but Xen might
unconditionally do additional IBPBs that the guest cannot disable.

> >> --- a/xen/arch/x86/pv/domain.c
> >> +++ b/xen/arch/x86/pv/domain.c
> >> @@ -455,6 +455,7 @@ static void _toggle_guest_pt(struct vcpu
> >>  void toggle_guest_mode(struct vcpu *v)
> >>  {
> >>      const struct domain *d = v->domain;
> >> +    struct cpu_info *cpu_info = get_cpu_info();
> >>      unsigned long gs_base;
> >>  
> >>      ASSERT(!is_pv_32bit_vcpu(v));
> >> @@ -467,15 +468,21 @@ void toggle_guest_mode(struct vcpu *v)
> >>      if ( v->arch.flags & TF_kernel_mode )
> >>          v->arch.pv.gs_base_kernel = gs_base;
> >>      else
> >> +    {
> >>          v->arch.pv.gs_base_user = gs_base;
> >> +
> >> +        if ( opt_ibpb_mode_switch &&
> >> +             !(d->arch.spec_ctrl_flags & SCF_entry_ibpb) &&
> >> +             !VM_ASSIST(d, mode_switch_no_ibpb) )
> >> +            cpu_info->spec_ctrl_flags |= SCF_new_pred_ctxt;
> > 
> > Likewise similar to the remarks I've made before, if doing an IBPB on
> > entry is enough to cover for the case here, it must also be fine to
> > issue the IBPB right here, instead of deferring to return to guest
> > context?
> > 
> > The only concern would be (as you mentioned before) to avoid clearing
> > valid Xen predictions, but I would rather see some figures about what
> > effect the delaying to return to guest has vs issuing it right here.
> 
> Part of the reason (aiui) to do things on the exit path was to
> consolidate the context switch induced one and the user->kernel switch
> one into the same place and mechanism.

Isn't it kind of a very specific case that we end up doing a
user->kernel switch as part of a context switch?  IOW: would require
the vCPU to be scheduled out at that very specific point.

> >> + *
> >> + * By default (on affected and capable hardware) as a safety measure Xen,
> >> + * to cover for the fact that guest-kernel and guest-user modes are both
> >> + * running in ring 3 (and hence share prediction context), would issue a
> >> + * barrier for user->kernel mode switches of PV guests.
> >> + */
> >> +#define VMASST_TYPE_mode_switch_no_ibpb  33
> > 
> > Would it be possible to define the assist as
> > VMASST_TYPE_mode_switch_ibpb and have it on when enabled?  So that the
> > guest would disable it if unneeded?  IMO negated options are in
> > general harder to understand.
> 
> Negative options aren't nice, yes, but VM assists start out as all
> clear.

Are you sure?  I see VMASST_TYPE_pae_extended_cr3 getting set in
dom0_construct_pv() and that makes me wonder whether other bits
couldn't start set also.

Maybe there's some restriction I'm missing, but I don't see any
wording in the description of the interface that states that all
assists are supposed to start disabled.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.