[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/vpmu: fix race-condition in vpmu_load



On Mon, Sep 19, 2022 at 5:28 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 16.09.2022 23:35, Boris Ostrovsky wrote:
> >
> > On 9/16/22 8:52 AM, Jan Beulich wrote:
> >> On 15.09.2022 16:01, Tamas K Lengyel wrote:
> >>> While experimenting with the vPMU subsystem an ASSERT failure was
> >>> observed in vmx_find_msr because the vcpu_runnable state was true.
> >>>
> >>> The root cause of the bug appears to be the fact that the vPMU subsystem
> >>> doesn't save its state on context_switch. The vpmu_load function will 
> >>> attempt
> >>> to gather the PMU state if its still loaded two different ways:
> >>>      1. if the current pcpu is not where the vcpu ran before doing a 
> >>> remote save
> >>>      2. if the current pcpu had another vcpu active before doing a local 
> >>> save
> >>>
> >>> However, in case the prev vcpu is being rescheduled on another pcpu its 
> >>> state
> >>> has already changed and vcpu_runnable is returning true, thus #2 will 
> >>> trip the
> >>> ASSERT. The only way to avoid this race condition is to make sure the
> >>> prev vcpu is paused while being checked and its context saved. Once the 
> >>> prev
> >>> vcpu is resumed and does #1 it will find its state already saved.
> >> While I consider this explanation plausible, I'm worried:
> >>
> >>> --- a/xen/arch/x86/cpu/vpmu.c
> >>> +++ b/xen/arch/x86/cpu/vpmu.c
> >>> @@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
> >>>           vpmu = vcpu_vpmu(prev);
> >>>
> >>>           /* Someone ran here before us */
> >>> +        vcpu_pause(prev);
> >>>           vpmu_save_force(prev);
> >>>           vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> >>> +        vcpu_unpause(prev);
> >>>
> >>>           vpmu = vcpu_vpmu(v);
> >>>       }
> >> We're running with IRQs off here, yet vcpu_pause() waits for the vcpu
> >> to actually be de-scheduled. Even with IRQs on this is already a
> >> relatively heavy operation (also including its impact on the remote
> >> side). Additionally the function is called from context_switch(), and
> >> I'm unsure of the usability of vcpu_pause() on such a path. In
> >> particular: Is there a risk of two CPUs doing this mutually to one
> >> another? If so, is deadlocking excluded?
> >>
> >> Hence at the very least I think the description wants extending, to
> >> discuss the safety of the change.
> >>
> >> Boris - any chance you could comment here? Iirc that's code you did
> >> introduce.
> >
> >
> > Is the assertion in vmx_find_msr() really needs to be for runnable vcpu or 
> > can it be a check on whether vcpu is actually running (e.g. 
> > RUNSTATE_running)?
>
> You cannot safely check for "running", as "runnable" may transition
> to/from "running" behind your back.

The more I look at this the more I think the only sensible solution is
to have the vPMU state be saved on vmexit for all vCPUs. That way all
this having to figure out where and when a context needs saving during
scheduling goes away. Yes, it adds a bit of overhead for cases where
the vCPU will resume on the same pCPU and that context saved could
have been skipped, but it makes it so that the vCPU can be resumed on
any pCPU without having to have evidently fragile checks that may
potentially lead to deadlocks (TBH I don't know if that's a real
concern at the moment because the current setup is very hard to reason
about). We can still keep track if the context needs reloading from
the saved context and skip that if we know the state is still active.
Any objection to that change in light of these issues?

Tamas



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.