Xen project Mailing List

Re: [Xen-devel] xc_hvm_inject_trap() races

On 11/07/2016 10:45 AM, Jan Beulich wrote: >>>> On 05.11.16 at 18:22, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >> On 11/02/2016 12:55 AM, Andrew Cooper wrote: >>> On 01/11/2016 22:17, Andrei Vlad LUTAS wrote: >>>>>> First of all, to answer your original question: the injection decision >>>>>> is made when the introspection logic needs to inspect a page that is >>>>>> not present in the physical memory. We don't really care if the current >>>>>> instruction triggers multiple faults or not (and here I'm not sure what >>>>>> you mean by that - multiple exceptions, or multiple EPT violations - >>>>>> but the answer is still the same), and removing the page restrictions >>>>>> after the #PF injection is introspection specific logic - the address >>>>>> for which we inject the #PF doesn't have to be related in any way to the >>>>>> current instruction. >>>>> Ah, that's this no-architectural behavior again. >>>> I don't think the HVI #PF injection internals or how the #PF is handled by >> the OS are relevant here. We are using an existing API that seems to not >> work >> quite correct under certain circumstances and we were curious if any of you >> can shed some light in this regard, and maybe point us to the right >> direction >> for cooking up a fix. >>> >>> Just because there is an API like this, doesn't necessarily mean it ever >>> worked. This one clearly doesn't, and it got introduced before we as a >>> community took a rather harder stance towards code review. >>> >>> Architecturally speaking, faults are always raised as a direct >>> consequence of the current state. Therefore, having in introspection >>> agent interposing on the instruction stream and causing faults as a side >>> effect of EPT permissions/etc, is quite natural and in line with >>> architectural expectation. >>> >>> You also have a second usecase for this API, which is to trick Windows >>> into paging in a frame you care about looking at. Overall, using this >>> #PF method to get Windows to page content back in clearly the rational >>> way of making that happen, but it is very definitely a non-architectural >>> usecase; if windows were to double check the instruction stream on >>> getting this pagefault, it would get very confused, as the pagefault it >>> received has nothing to do with the code pointed at in the exception frame. >>> >>> It is quite likely that these different usecases might have different >>> solutions. IMO, the former should probably be controlled by responses >>> in the vm_event ring, but this latter issue probably shouldn't. >>> >>> When it comes to injecting exceptions, there are some restrictions which >>> limit what can legitimately be done. We can only inject a single thing >>> at once. Stacking a #PF on top of a plain interrupt could be resolved >>> by leaving the interrupt pending in the vLAPIC and injecting the #PF >>> instead. Stacking a #PF on top of a different fault is going to cause >>> hvm_combine_exceptions() to turn it into something more nasty. OTOH, >>> deferring the #PF by even a single instruction could result in it being >>> sent in an unsafe context, which should also be avoided. >> >> I wonder if it would be acceptable to simply add "&& >> v->arch.hvm_vcpu.inject_trap.vector != -1" to the early return condition >> for vmx_intr_assist()? >> >> Event injection is already blocked there if v->arch.hvm_vcpu.single_step >> is set, so at least in that case it is acceptable behaviour. We could >> also block it if there's a pending user-requested injection. > > While that might be an option, how would that help? Didn't you say > that your problem was with the injected vector being ignored > because there already was an event injected? That to me implies > that hvm_do_resume() (doing the injection you care about) runs > _after_ that other injection, yet it certainly runs before > vmx_intr_assist(). Am I overlooking or mis-remembering something? The xc_hvm_inject_trap() injected vector is not ignored (since we're doing a #PF), our injection does succeed. The main problem is that this can override pending interrupts - specifically reinjected ones. You're right, blocking interrupts in the style of the singlestep code only solves the problem partially. The interrupts we're worried about (even with the singlestep approach) are the ones reinjected by vmx_idtv_reinject() in vmx_vmexit_handler(). The problem there is that vmx_idtv_reinject() is a corner case: it writes VM_ENTRY_INTR_INFO directly, and this can happen _before_ the guest exits with, let's say, an EPT vm_event. In that case, the guest has exited, there's already an interrupt pending, and while the VCPU is paused waiting for the vm_event reply we request an injection that effectively obliterates the pending interrupt. Going the singlestep way is satisfactory for us for most cases, but it still leaves the described corner case. The only fix proposal we could think of is to, instead of vmx_idtv_reinject() doing the work, simply set some flags, and have a later function do the actual work, in vmx_intr_assist() style. I hope I've been able to make this clearer (and I haven't misunderstood something in the process :) ). Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.