Xen project Mailing List

Re: [Xen-devel] [PATCH 5/5] x86/HVM: cache emulated instruction for retry processing

To: "Yang Z Zhang" <yang.z.zhang@xxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 07 Jan 2014 08:28:33 +0000

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 07 Jan 2014 08:28:47 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 24.12.13 at 12:29, "Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> wrote: > Jan Beulich wrote on 2013-12-18: >>>>> On 18.12.13 at 10:40, "Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> wrote: >>> Jan Beulich wrote on 2013-12-18: >>>>>>> On 18.12.13 at 09:36, "Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> >> wrote: >>>>> Jan Beulich wrote on 2013-09-30: >>>>>> Rather than re-reading the instruction bytes upon retry >>>>>> processing, stash away and re-use tha what we already read. That >>>>>> way we can be certain that the retry won't do something different >>>>>> from what requested the retry, getting once again closer to real >>>>>> hardware behavior (where what we use retries for is simply a bus >>>>>> operation, not involving redundant decoding of instructions). >>>>>> >>>>> >>>>> This patch doesn't consider the nested case. >>>>> For example, if the buffer saved the L2's instruction, then vmexit >>>>> to >>>>> L1 and >>>>> L1 may use the wrong instruction. >>>> >>>> I'm having difficulty seeing how the two could get intermixed: >>>> There should be, at any given point in time, at most one >>>> instruction being emulated. Can you please give a more elaborate >>>> explanation of the situation where you see a (theoretical? practical?) > problem? >>> >>> I saw this issue when booting L1 hyper-v. I added some debug info >>> and saw the strange phenomenon: >>> >>> (XEN) write to buffer: eip 0xfffff8800430bc80, size 16, >>> content:f7420c1f608488b 44000000011442c7 >>> (XEN) read from buffer: eip 0xfffff800002f6138, size 16, >>> content:f7420c1f608488b 44000000011442c7 >>> >>> From the log, we can see different eip but using the same buffer. >>> Since I don't know how hyper-v working, so I cannot give more >>> information why this happens. And I only saw it with L1 hyper-v (Xen >>> on Xen and KVM on Xen don't have this issue) . >> >> But in order to validate the fix is (a) needed and (b) correct, a >> proper understanding of the issue is necessary as the very first step. >> Which doesn't require knowing internals of Hyper-V, all you need is >> tracking of emulator state changes in Xen (which I realize is said >> easier than done, since you want to make sure you don't generate huge >> amounts of output before actually hitting the issue, making it close to > impossible to analyze). > > Ok. It should be an old issue and just exposed by your patch only: > Sometimes, L0 need to decode the L2's instruction to handle IO access > directly. For example, if L1 pass through (not VT-d) a device to L2 directly. > And L0 may get X86EMUL_RETRY when handling this IO request. At same time, if > there is a virtual vmexit pending (for example, an interrupt pending to > inject to L1) and hypervisor will switch the VCPU context from L2 to L1. Now > we already in L1's context, but since we got a X86EMUL_RETRY just now and > this means hyprevisor will retry to handle the IO request later and > unfortunately, the retry will happen in L1's context. > Without your patch, L0 just emulate an L1's instruction and everything goes > on. With your patch, L0 will get the instruction from the buffer which is > belonging to L2, and problem arises. > > So the fixing is that if there is a pending IO request, no virtual > vmexit/vmentry is allowed which following hardware's behavior. > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > index 41db52b..c5446a9 100644 > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -1394,7 +1394,10 @@ void nvmx_switch_guest(void) > struct vcpu *v = current; > struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); > struct cpu_user_regs *regs = guest_cpu_user_regs(); > + ioreq_t *p = get_ioreq(v); > > + if ( p->state != STATE_IOREQ_NONE ) > + return; > /* > * a softirq may interrupt us between a virtual vmentry is > * just handled and the true vmentry. If during this window, This change looks much more sensible; question is whether simply ignoring the switch request is acceptable (and I don't know the nested HVM code well enough to tell). Furthermore I wonder whether that's really a VMX-only issue. Another alternative might be to flush the cached instruction when a guest mode switch is being done. Again, I don't know the nested code well enough to tell what the most suitable behavior here would be. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.