[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/vm_event: fix race between vmx_vmexit_handler() and vm_event_resume()



On 27/04/2017 09:01, Jan Beulich wrote:
>>>> On 27.04.17 at 09:22, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>> The introspection agent can reply to a vm_event faster than
>> vmx_vmexit_handler() can complete in some cases, where it is then
>> not safe for vm_event_set_registers() to modify v->arch.user_regs.
> This needs more explanation: I cannot see the connection between
> vmx_vmexit_handler() completing and (un)safety of modification of
> v->arch.user_regs. The latter is unsafe as long as the vCPU hasn't
> gone through __context_switch(), and the former doesn't call that
> function directly or indirectly (i.e. I think it is more than just
> vmx_vmexit_handler() which needs to be completed by the time
> register modification becomes safe to do).

The test scenario here was to step over an int3 breakpoint by setting
rip += 1.  This is a very quick reply and tended to complete before the
vcpu triggering the introspection event had properly paused and been
descheduled.

If the reply occurs before __context_switch() happens,
__context_switch() clobbers the reply by overwriting v->arch.user_regs
from the stack.  If the reply occurs after __context_switch(), but the
pcpu has  gone idle and keeps v in context, v->arch.user_regs is not
restored to the stack, and the update is similarly missed.  (There is a
very narrow race condition where both the reply and __context_switch()
are updating v->arch.user_regs, and who knows what will happen, given
our memcpy implementation.)

>
>> The patch additionally removes the sync_vcpu_execstate(v) call from
>> vm_event_resume(), which is no longer necessary, which removes the
>> associated broadcast TLB flush (read: performance improvement).
> Depending on the better explanation above, it may or may not be
> further necessary to also better explain the "no longer necessary"
> part here.

As I understand, it was a previous attempt to fix this bug.

There is nothing (now) in vm_event_resume() which does anything other
than new updates into v->arch.vm_event and drop the pause reference. 
All updated are applied in context, in the return-to-guest path, which
is race free.

There is no need for the IPI, or to force the target vcpu out of context
if the pcpu is currently idle.

>
>> --- a/xen/common/vm_event.c
>> +++ b/xen/common/vm_event.c
>> @@ -357,6 +357,13 @@ void vm_event_resume(struct domain *d, struct 
>> vm_event_domain *ved)
>>  {
>>      vm_event_response_t rsp;
>>  
>> +    /*
>> +     * vm_event_resume() runs either from XEN_DOMCTL_VM_EVENT_OP_*, or
>> +     * EVTCHN_send from the introspection consumer.  Both contexts are
>> +     * guaranteed not to be the subject of vm_event responses.
>> +     */
>> +    ASSERT(d != current->domain);
> What is this meant to guard against?

Documentation, as much as anything else.  It took a long time to
untangle the contexts involved here; I'm not convinced the logic is safe
to run in context, and it certainly doesn't need to.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.