Xen project Mailing List

Re: [Xen-devel] [PATCH 5/5] x86/HVM: cache emulated instruction for retry processing

To: "Egger, Christoph" <chegger@xxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

From: "Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx>

Date: Wed, 8 Jan 2014 05:50:59 +0000

Accept-language: en-US

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 08 Jan 2014 05:51:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHOvd1TT7LqR1t2qUWUIfiMLEuwe5paFDAQ//+FyoCAAIpBsP//mLYAgAn4BVCAFU4agIAAhs5w//+OMoAAOoMiMA==

Thread-topic: [Xen-devel] [PATCH 5/5] x86/HVM: cache emulated instruction for retry processing

Egger, Christoph wrote on 2014-01-07: > On 07.01.14 09:54, Zhang, Yang Z wrote: >> Jan Beulich wrote on 2014-01-07: >>>>>> On 24.12.13 at 12:29, "Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> > wrote: >>>> Jan Beulich wrote on 2013-12-18: >>>>>>>> On 18.12.13 at 10:40, "Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> >>> wrote: >>>>>> Jan Beulich wrote on 2013-12-18: >>>>>>>>>> On 18.12.13 at 09:36, "Zhang, Yang Z" >>>>>>>>>> <yang.z.zhang@xxxxxxxxx> >>>>> wrote: >>>>>>>> Jan Beulich wrote on 2013-09-30: >>>>>>>>> Rather than re-reading the instruction bytes upon retry >>>>>>>>> processing, stash away and re-use tha what we already read. That >>>>>>>>> way we can be certain that the retry won't do something >>>>>>>>> different from what requested the retry, getting once again >>>>>>>>> closer to real hardware behavior (where what we use retries for >>>>>>>>> is simply a bus operation, not involving redundant decoding of >>>>>>>>> instructions). >>>>>>>>> >>>>>>>> >>>>>>>> This patch doesn't consider the nested case. >>>>>>>> For example, if the buffer saved the L2's instruction, then >>>>>>>> vmexit to >>>>>>>> L1 and >>>>>>>> L1 may use the wrong instruction. >>>>>>> >>>>>>> I'm having difficulty seeing how the two could get intermixed: >>>>>>> There should be, at any given point in time, at most one >>>>>>> instruction being emulated. Can you please give a more >>>>>>> elaborate explanation of the situation where you see a (theoretical? >>>>>>> practical?) >>>> problem? >>>>>> >>>>>> I saw this issue when booting L1 hyper-v. I added some debug >>>>>> info and saw the strange phenomenon: >>>>>> >>>>>> (XEN) write to buffer: eip 0xfffff8800430bc80, size 16, >>>>>> content:f7420c1f608488b 44000000011442c7 >>>>>> (XEN) read from buffer: eip 0xfffff800002f6138, size 16, >>>>>> content:f7420c1f608488b 44000000011442c7 >>>>>> >>>>>> From the log, we can see different eip but using the same buffer. >>>>>> Since I don't know how hyper-v working, so I cannot give more >>>>>> information why this happens. And I only saw it with L1 hyper-v >>>>>> (Xen on Xen and KVM on Xen don't have this issue) . >>>>> >>>>> But in order to validate the fix is (a) needed and (b) correct, a >>>>> proper understanding of the issue is necessary as the very first step. >>>>> Which doesn't require knowing internals of Hyper-V, all you need >>>>> is tracking of emulator state changes in Xen (which I realize is >>>>> said easier than done, since you want to make sure you don't >>>>> generate huge amounts of output before actually hitting the >>>>> issue, making it close to >>>> impossible to analyze). >>>> >>>> Ok. It should be an old issue and just exposed by your patch only: >>>> Sometimes, L0 need to decode the L2's instruction to handle IO access >>>> directly. For example, if L1 pass through (not VT-d) a device to L2 >>>> directly. And L0 may get X86EMUL_RETRY when handling this IO request. >>>> At same time, if there is a virtual vmexit pending (for example, an >>>> interrupt pending to inject to L1) and hypervisor will switch the >>>> VCPU context from L2 to L1. Now we already in L1's context, but since >>>> we got a X86EMUL_RETRY just now and this means hyprevisor will retry >>>> to handle the IO request later and unfortunately, the retry will >>>> happen in L1's context. Without your patch, L0 just emulate an L1's >>>> instruction and everything goes on. With your patch, L0 will get the >>>> instruction from the buffer which is belonging to L2, and problem >>>> arises. >>>> >>>> So the fixing is that if there is a pending IO request, no virtual >>>> vmexit/vmentry is allowed which following hardware's behavior. >>>> >>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c >>>> b/xen/arch/x86/hvm/vmx/vvmx.c index 41db52b..c5446a9 100644 >>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c >>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c >>>> @@ -1394,7 +1394,10 @@ void nvmx_switch_guest(void) >>>> struct vcpu *v = current; >>>> struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); >>>> struct cpu_user_regs *regs = guest_cpu_user_regs(); >>>> + ioreq_t *p = get_ioreq(v); >>>> >>>> + if ( p->state != STATE_IOREQ_NONE ) >>>> + return; >>>> /* >>>> * a softirq may interrupt us between a virtual vmentry is >>>> * just handled and the true vmentry. If during this window, >>> >>> This change looks much more sensible; question is whether simply >>> ignoring the switch request is acceptable (and I don't know the >>> nested HVM code well enough to tell). Furthermore I wonder whether >>> that's > really a VMX-only issue. >> >> From hardware's point, an IO operation is handled atomically. So the >> problem will never happen. But in Xen, an IO operation is divided into >> several steps. Without nested virtualization, the VCPU is paused or >> looped until the IO emulation is finished. But in nested environment, >> the VCPU will continue running even the IO emulation is not finished. >> So my patch will check this and allow the VCPU to continue running only >> there is no pending IO request. This is matching hardware's behavior. >> >> I guess SVM also has this problem. But since I don't know nested SVM >> well, so perhaps Christoph can help to have a double check. > > For SVM this issue was fixed with commit > d740d811925385c09553cbe6dee8e77c1d43b198 > > And there is a followup cleanup commit > ac97fa6a21ccd395cca43890bbd0bf32e3255ebb > > The change in nestedsvm.c in commit > d740d811925385c09553cbe6dee8e77c1d43b198 is actually not SVM specific. > > Move that into nhvm_interrupt_blocked() in hvm.c right before > > return hvm_funcs.nhvm_intr_blocked(v); > and the fix applies for both SVM and VMX. > I guess this is no enough. L2 may vmexit to L1 during IO emulation not only due to interrupt. Now I cannot give an example, but the hyper-v cannot boot up with your suggestion. So I guess only consider external interrupt is not enough. We should prohibit vmswith if there is a pending IO emulation both from L1 or L2(This may never happen to L1, but from hardware's behavior, we should add this check). Best regards, Yang _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.