[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs
On Thu, May 9, 2019 at 12:00 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > > On 09/05/2019 18:46, Tamas K Lengyel wrote: > > On Thu, May 9, 2019 at 10:43 AM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > wrote: > >> On 09/05/2019 17:19, Mathieu Tarral wrote: > >>> Le mardi, mai 7, 2019 2:01 PM, Mathieu Tarral > >>> <mathieu.tarral@xxxxxxxxxxxxxx> a écrit : > >>> > >>>>> Given how many EPT flushing bugs I've already found in this area, I > >>>>> wouldn't be surprised if there are further ones lurking. If it is an > >>>>> EPT flushing bug, this delta should make it go away, but it will come > >>>>> with a hefty perf hit. > >>>>> > >>>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > >>>>> index 283eb7b..019333d 100644 > >>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c > >>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c > >>>>> @@ -4285,9 +4285,7 @@ bool vmx_vmenter_helper(const struct > >>>>> cpu_user_regs *regs) > >>>>> } > >>>>> } > >>>>> > >>>>> - if ( inv ) > >>>>> - __invept(inv == 1 ? INVEPT_SINGLE_CONTEXT : > >>>>> INVEPT_ALL_CONTEXT, > >>>>> - inv == 1 ? single->eptp : 0); > >>>>> + __invept(INVEPT_ALL_CONTEXT, 0); > >>>>> } > >>>>> > >>>>> out: > >>>> I can give this a try, and see if it resolves the problem ! > >>> Just tested, on Xen 4.12.0, and the bug is still here. > >>> Windows 7 is having BSODs with 4 VCPUs. > >>> I didn't noticed a hefty performance impact though. > >>> > >>> Do we have other caches to invalidate ? > >>> Something else that i should test ? > >>> > >>> I don't feel comfortable digging into Xen's code, especially for > >>> something as complicated as page table and memory management, > >>> increased by the complexity of altp2m. > >>> What i can do however, is test your ideas and patches and report the > >>> information I can gather on this issue. > >>> > >>> Note: I tested with the latest commits on Drakvuf/master, especially: > >>> "Add a VM pause for shadow copy refresh operation" > >>> https://github.com/tklengyel/drakvuf/pull/626 > >>> > >>> @tamas, did you made this patch to fix these kind of race conditions > >>> issue that i'm reporting ? > >>> Or was it totally unrelated ? > >> With the above change in place and BSODs still happening, I'm fairly > >> convinced that it not a TLB flushing issue. > >> > >> Therefore, the conclusion to draw is that it is a logical bug somewhere. > > I agree. > > > >> First of all - ensure you are using up-to-date microcode. The number of > >> errata which have been discovered by people associated with the Xen > >> community is large. > >> > >> The microcode is available from > >> https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/ and > >> https://andrewcoop-xen.readthedocs.io/en/latest/admin-guide/microcode-loading.html > >> is some documentation I prepared earlier. > >> > >> Beyond that, I think it would help to know exactly how libvmi is > >> manipulating the guest. > > I already suggested to Mathieu to try to reproduce the issue using the > > xen-access test tool that's in the Xen tree to cut out all that > > complexity. > > xen-access is ok, but I've never encountered a situation where I haven't > had to modify it first to get it usable. Right, it would likely have to be modified. > > I have some plans to replace it with something far more usable, as part > of tying together some XTF-based VMI testing, but none of that is > remotely ready yet. Yes, that would be fantastic to have. > > Without being able to limit the scope of the bug and being > > able to reproducible trigger it I see little chance of finding the > > root cause. Unfortunately I don't have the time to do that myself. > > I can probably help out with some suggestions, but I agree that we are > going to have to cut out some of the complexity here to figure out > exactly what is going on. > > Alternatively, if there are some sufficiently detailed instructions for > how to put together a repro of the problem using libvmi/etc, I might be > able to start debugging from that, but I definitely don't have time to > do that in the next week. The instructions are on https://drakvuf.com. AFAICT Mathieu is running into the issue with simply running it on a up-to-date Windows 10 guest but not in any way that I would call reproducible. Running it "for a minute or two" is really not a reproducible bug description. Tamas _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |