[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down



On Sat, 27 Feb 2021, Julien Grall wrote:
> (+ Dario and George)
> 
> Hi Stefano,
> 
> I have added Dario and George to get some inputs from the scheduling part.
> 
> On 27/02/2021 01:58, Stefano Stabellini wrote:
> > On Fri, 26 Feb 2021, Julien Grall wrote:
> > > From: Julien Grall <jgrall@xxxxxxxxxx>
> > > 
> > > A vCPU can get scheduled as soon as _VPF_down is cleared. As there is
> > > currently not ordering guarantee in arch_set_info_guest(), it may be
> > > possible that flag can be observed cleared before the new values of vCPU
> > > registers are observed.
> > > 
> > > Add an smp_mb() before the flag is cleared to prevent re-ordering.
> > > 
> > > Signed-off-by: Julien Grall <jgrall@xxxxxxxxxx>
> > > 
> > > ---
> > > 
> > > Barriers should work in pair. However, I am not entirely sure whether to
> > > put the other half. Maybe at the beginning of context_switch_to()?
> > 
> > It should be right after VGCF_online is set or cleared, right?
> 
> vcpu_guest_context_t is variable allocated on the heap just for the purpose of
> this call. So an ordering with VGFC_online is not going to do anything.
> 
> > So it
> > would be:
> > 
> > xen/arch/arm/domctl.c:arch_get_info_guest
> > xen/arch/arm/vpsci.c:do_common_cpu_on
> > 
> > But I think it is impossible that either of them get called at the same
> > time as arch_set_info_guest, which makes me wonder if we actually need
> > the barrier...
> arch_get_info_guest() is called without the domain lock held and I can't see
> any other lock that could prevent it to be called in // of
> arch_set_info_guest().
> 
> So you could technically get corrupted information from
> XEN_DOMCTL_getvcpucontext. For this case, we would want a smp_wmb() before
> writing to v->is_initialised. The corresponding read barrier would be in
> vcpu_pause() -> vcpu_sleep_sync() -> sync_vcpu_execstate().
> 
> But this is not the issue I was originally trying to solve. Currently,
> do_common_cpu_on() will roughly do:
> 
>  1) domain_lock(d)
> 
>  2) v->arch.sctlr = ...
>     v->arch.ttbr0 = ...
> 
>  3) clear_bit(_VFP_down, &v->pause_flags);
> 
>  4) domain_unlock(d)
> 
>  5) vcpu_wake(v);
> 
> If we had only one pCPU on the system, then we would only wake the vCPU in
> step 5. We would be fine in this situation. But that's not the interesting
> case.
> 
> If you add a second pCPU in the story, it may be possible to have vcpu_wake()
> happening in // (see more below). As there is no memory barrier, step 3 may be
> observed before 2. So, assuming the vcpu is runnable, we could start to
> schedule a vCPU before any update to the registers (step 2) are observed.
> 
> This means that when context_switch_to() is called, we may end up to restore
> some old values.
> 
> Now the question is can vcpu_wake() be called in // from another pCPU? AFAICT,
> it would be only called if a given flag in v->pause_flags is cleared (e.g.
> _VFP_blocked). But can we rely on that?
> 
> Even if we can rely on it, v->pause_flags has other flags in it. I couldn't
> rule out that _VPF_down cannot be set at the same time as the other _VPF_*.
> 
> Therefore, I think a barrier is necessary to ensure the ordering.
> 
> Do you agree with this analysis?
 
Yes, I think this makes sense. The corresponding barrier in the
scheduling code would have to be after reading _VPF_down and before
reading v->arch.sctlr, etc.


> > > The issues described here is also quite theoritical because there are
> > > hundreds of instructions executed between the time a vCPU is seen
> > > runnable and scheduled. But better be safe than sorry :).
> > > ---
> > >   xen/arch/arm/domain.c | 7 +++++++
> > >   1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > > index bdd3d3e5b5d5..2b705e66be81 100644
> > > --- a/xen/arch/arm/domain.c
> > > +++ b/xen/arch/arm/domain.c
> > > @@ -914,7 +914,14 @@ int arch_set_info_guest(
> > >       v->is_initialised = 1;
> > >         if ( ctxt->flags & VGCF_online )
> > > +    {
> > > +        /*
> > > +         * The vCPU can be scheduled as soon as _VPF_down is cleared.
> > > +         * So clear the bit *after* the context was loaded.
> > > +         */
> > > +        smp_mb();
> 
> From the discussion above, I would move this barrier before v->is_initialised.
> So we also take care of the issue with arch_get_info_guest().
> 
> This barrier also can be reduced to a smp_wmb() as we only expect an ordering
> between writes.
> 
> The barrier would be paired with the barrier in:
>    - sync_vcpu_execstate() in the case of arch_get_vcpu_info_guest().
>    - context_switch_to() in the case of scheduling (The exact barrier is TDB).

OK, this makes sense, but why before:

  v->is_initialised = 1;

instead of right after it? It is just v->pause_flags we care about,
right?



> > >           clear_bit(_VPF_down, &v->pause_flags);
> > > +    }
> > >       else
> > >           set_bit(_VPF_down, &v->pause_flags);
> > >   




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.