Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/vm_event: fix race between vmx_vmexit_handler() and vm_event_resume()

To: Jan Beulich <JBeulich@xxxxxxxx>, Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 27 Apr 2017 09:34:03 +0100

Cc: tamas@xxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxx

Delivery-date: Thu, 27 Apr 2017 08:34:19 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 27/04/2017 09:01, Jan Beulich wrote: >>>> On 27.04.17 at 09:22, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >> The introspection agent can reply to a vm_event faster than >> vmx_vmexit_handler() can complete in some cases, where it is then >> not safe for vm_event_set_registers() to modify v->arch.user_regs. > This needs more explanation: I cannot see the connection between > vmx_vmexit_handler() completing and (un)safety of modification of > v->arch.user_regs. The latter is unsafe as long as the vCPU hasn't > gone through __context_switch(), and the former doesn't call that > function directly or indirectly (i.e. I think it is more than just > vmx_vmexit_handler() which needs to be completed by the time > register modification becomes safe to do). The test scenario here was to step over an int3 breakpoint by setting rip += 1. This is a very quick reply and tended to complete before the vcpu triggering the introspection event had properly paused and been descheduled. If the reply occurs before __context_switch() happens, __context_switch() clobbers the reply by overwriting v->arch.user_regs from the stack. If the reply occurs after __context_switch(), but the pcpu has gone idle and keeps v in context, v->arch.user_regs is not restored to the stack, and the update is similarly missed. (There is a very narrow race condition where both the reply and __context_switch() are updating v->arch.user_regs, and who knows what will happen, given our memcpy implementation.) > >> The patch additionally removes the sync_vcpu_execstate(v) call from >> vm_event_resume(), which is no longer necessary, which removes the >> associated broadcast TLB flush (read: performance improvement). > Depending on the better explanation above, it may or may not be > further necessary to also better explain the "no longer necessary" > part here. As I understand, it was a previous attempt to fix this bug. There is nothing (now) in vm_event_resume() which does anything other than new updates into v->arch.vm_event and drop the pause reference. All updated are applied in context, in the return-to-guest path, which is race free. There is no need for the IPI, or to force the target vcpu out of context if the pcpu is currently idle. > >> --- a/xen/common/vm_event.c >> +++ b/xen/common/vm_event.c >> @@ -357,6 +357,13 @@ void vm_event_resume(struct domain *d, struct >> vm_event_domain *ved) >> { >> vm_event_response_t rsp; >> >> + /* >> + * vm_event_resume() runs either from XEN_DOMCTL_VM_EVENT_OP_*, or >> + * EVTCHN_send from the introspection consumer. Both contexts are >> + * guaranteed not to be the subject of vm_event responses. >> + */ >> + ASSERT(d != current->domain); > What is this meant to guard against? Documentation, as much as anything else. It took a long time to untangle the contexts involved here; I'm not convinced the logic is safe to run in context, and it certainly doesn't need to. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.