[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/vm_event: fix race between vmx_vmexit_handler() and vm_event_resume()



>>> On 27.04.17 at 10:34, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 27/04/2017 09:01, Jan Beulich wrote:
>>>>> On 27.04.17 at 09:22, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>> The introspection agent can reply to a vm_event faster than
>>> vmx_vmexit_handler() can complete in some cases, where it is then
>>> not safe for vm_event_set_registers() to modify v->arch.user_regs.
>> This needs more explanation: I cannot see the connection between
>> vmx_vmexit_handler() completing and (un)safety of modification of
>> v->arch.user_regs. The latter is unsafe as long as the vCPU hasn't
>> gone through __context_switch(), and the former doesn't call that
>> function directly or indirectly (i.e. I think it is more than just
>> vmx_vmexit_handler() which needs to be completed by the time
>> register modification becomes safe to do).
> 
> The test scenario here was to step over an int3 breakpoint by setting
> rip += 1.  This is a very quick reply and tended to complete before the
> vcpu triggering the introspection event had properly paused and been
> descheduled.
> 
> If the reply occurs before __context_switch() happens,
> __context_switch() clobbers the reply by overwriting v->arch.user_regs
> from the stack.  If the reply occurs after __context_switch(), but the
> pcpu has  gone idle and keeps v in context, v->arch.user_regs is not
> restored to the stack, and the update is similarly missed.

This second case not properly described, I think: v won't be kept in
context once we've gone through __context_switch(). If the pCPU
goes idle, __context_switch() simply will be deferred until another
vCPU gets scheduled onto this pCPU, and be avoided altogether if
it's the original vCPU that gets scheduled back onto this pCPU. But
I do get the point. I think much if not all of the above (suitably
adjusted) needs to go into the commit message.

>  (There is a
> very narrow race condition where both the reply and __context_switch()
> are updating v->arch.user_regs, and who knows what will happen, given
> our memcpy implementation.)

Right, but with the proper serialization of events done now this
isn't a problem anymore either, aiui.

>>> The patch additionally removes the sync_vcpu_execstate(v) call from
>>> vm_event_resume(), which is no longer necessary, which removes the
>>> associated broadcast TLB flush (read: performance improvement).
>> Depending on the better explanation above, it may or may not be
>> further necessary to also better explain the "no longer necessary"
>> part here.
> 
> As I understand, it was a previous attempt to fix this bug.
> 
> There is nothing (now) in vm_event_resume() which does anything other
> than new updates into v->arch.vm_event and drop the pause reference. 
> All updated are applied in context, in the return-to-guest path, which
> is race free.
> 
> There is no need for the IPI, or to force the target vcpu out of context
> if the pcpu is currently idle.

I agree, and (as indicated) the better explanation earlier on is
probably sufficient then.

>>> --- a/xen/common/vm_event.c
>>> +++ b/xen/common/vm_event.c
>>> @@ -357,6 +357,13 @@ void vm_event_resume(struct domain *d, struct 
>>> vm_event_domain *ved)
>>>  {
>>>      vm_event_response_t rsp;
>>>  
>>> +    /*
>>> +     * vm_event_resume() runs either from XEN_DOMCTL_VM_EVENT_OP_*, or
>>> +     * EVTCHN_send from the introspection consumer.  Both contexts are
>>> +     * guaranteed not to be the subject of vm_event responses.
>>> +     */
>>> +    ASSERT(d != current->domain);
>> What is this meant to guard against?
> 
> Documentation, as much as anything else.  It took a long time to
> untangle the contexts involved here; I'm not convinced the logic is safe
> to run in context, and it certainly doesn't need to.

Well, as said - I think it is too strict now: You only need the vCPU
not be current afaict, and it really matters here which of the two
"current"-s you actually mean (and it looks to me as if you mean
the other one, guaranteeing register state to no longer be on the
stack).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.