[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs
On 09/05/2019 18:46, Tamas K Lengyel wrote: > On Thu, May 9, 2019 at 10:43 AM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > wrote: >> On 09/05/2019 17:19, Mathieu Tarral wrote: >>> Le mardi, mai 7, 2019 2:01 PM, Mathieu Tarral >>> <mathieu.tarral@xxxxxxxxxxxxxx> a écrit : >>> >>>>> Given how many EPT flushing bugs I've already found in this area, I >>>>> wouldn't be surprised if there are further ones lurking. If it is an EPT >>>>> flushing bug, this delta should make it go away, but it will come with a >>>>> hefty perf hit. >>>>> >>>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c >>>>> index 283eb7b..019333d 100644 >>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>>>> @@ -4285,9 +4285,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs >>>>> *regs) >>>>> } >>>>> } >>>>> >>>>> - if ( inv ) >>>>> - __invept(inv == 1 ? INVEPT_SINGLE_CONTEXT : >>>>> INVEPT_ALL_CONTEXT, >>>>> - inv == 1 ? single->eptp : 0); >>>>> + __invept(INVEPT_ALL_CONTEXT, 0); >>>>> } >>>>> >>>>> out: >>>> I can give this a try, and see if it resolves the problem ! >>> Just tested, on Xen 4.12.0, and the bug is still here. >>> Windows 7 is having BSODs with 4 VCPUs. >>> I didn't noticed a hefty performance impact though. >>> >>> Do we have other caches to invalidate ? >>> Something else that i should test ? >>> >>> I don't feel comfortable digging into Xen's code, especially for something >>> as complicated as page table and memory management, >>> increased by the complexity of altp2m. >>> What i can do however, is test your ideas and patches and report the >>> information I can gather on this issue. >>> >>> Note: I tested with the latest commits on Drakvuf/master, especially: >>> "Add a VM pause for shadow copy refresh operation" >>> https://github.com/tklengyel/drakvuf/pull/626 >>> >>> @tamas, did you made this patch to fix these kind of race conditions issue >>> that i'm reporting ? >>> Or was it totally unrelated ? >> With the above change in place and BSODs still happening, I'm fairly >> convinced that it not a TLB flushing issue. >> >> Therefore, the conclusion to draw is that it is a logical bug somewhere. > I agree. > >> First of all - ensure you are using up-to-date microcode. The number of >> errata which have been discovered by people associated with the Xen >> community is large. >> >> The microcode is available from >> https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/ and >> https://andrewcoop-xen.readthedocs.io/en/latest/admin-guide/microcode-loading.html >> is some documentation I prepared earlier. >> >> Beyond that, I think it would help to know exactly how libvmi is >> manipulating the guest. > I already suggested to Mathieu to try to reproduce the issue using the > xen-access test tool that's in the Xen tree to cut out all that > complexity. xen-access is ok, but I've never encountered a situation where I haven't had to modify it first to get it usable. I have some plans to replace it with something far more usable, as part of tying together some XTF-based VMI testing, but none of that is remotely ready yet. > Without being able to limit the scope of the bug and being > able to reproducible trigger it I see little chance of finding the > root cause. Unfortunately I don't have the time to do that myself. I can probably help out with some suggestions, but I agree that we are going to have to cut out some of the complexity here to figure out exactly what is going on. Alternatively, if there are some sufficiently detailed instructions for how to put together a repro of the problem using libvmi/etc, I might be able to start debugging from that, but I definitely don't have time to do that in the next week. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |