[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs



Le lundi, mai 6, 2019 7:07 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> a écrit :
There is a lot in here.
I wanted to gather enough data before making a bug report on such a complicated issue.

As for your BSOD analysis, the first thing to be aware of is that Double Fault is not necessarily precise, which means you can't necessarily trust any of the registers.  That said, most double faults are precise in practice, so if you're seeing it reliably at the same place, then it is likely to be a precise example.
I can reliably reproduce the Double Fault after ~10 tests on Windows 10 with KPTI.
And the stacktrace always show the beginning of KiSystemCall64ShadowCommon, which is executed after the CR3 switch to the kernel page tables.

Your faulting address isn't the immediately after the pagetable switch.  It is one instruction further on, after the stack switch, which means at the very minimum that reading the new rsp out of the per-processor storage succeeded.

The stack switch, combined with `push $0x2b` faulting is a clear sign that the stack is bad.  As the stack pointer looks plausible, it is almost certainly the pagewalk from %rsp which is bad.  Judging by the Windbg guide, you want to use !pte to dump the pagewalk (but I have never used it in anger before).
I checked RSP, and it's mapped in the kernel page tables:
# print kernel and userland page directory physical address
kd> dt _EPROCESS ffffdf8815e15340 ImageFileName Pcb.Directorytablebase Pcb.Userdirectorytablebase
ntdll!_EPROCESS
   +0x000 Pcb                        :
      +0x028 DirectoryTableBase         : 0xcbf10002
      +0x278 UserDirectoryTableBase     : 0xcbe00001
   +0x450 ImageFileName              : [15]  "ctfmon.exe"

# print RSP
kd> r rsp
rsp=fffff800b006cd08

# translate RSP to physical address
kd> !vtop cbf10000 fffff800b006cd08
Amd64VtoP: Virt fffff800b006cd08, pagedir 00000000cbf10000
Amd64VtoP: PML4E 00000000cbf10f80
Amd64VtoP: PDPE 0000000003708010
Amd64VtoP: PDE 0000000003709c00
Amd64VtoP: PTE 000000000371d360
Amd64VtoP: Mapped phys 000000000546cd08
Virtual address fffff800b006cd08 translates to physical address 546cd08.


Given how many EPT flushing bugs I've already found in this area, I wouldn't be surprised if there are further ones lurking.  If it is an EPT flushing bug, this delta should make it go away, but it will come with a hefty perf hit.


diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 283eb7b..019333d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4285,9 +4285,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
             }
         }

-        if ( inv )
-            __invept(inv == 1 ? INVEPT_SINGLE_CONTEXT : INVEPT_ALL_CONTEXT,
-                     inv == 1 ? single->eptp          : 0);
+        __invept(INVEPT_ALL_CONTEXT, 0);
     }

  out:
I can give this a try, and see if it resolves the problem !

Thanks Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.