[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Nullptr dereference in nested VMX when shadow VMCS support is available


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Manuel Andreas <manuel.andreas@xxxxxx>
  • Date: Mon, 2 Jun 2025 16:52:28 +0200
  • Authentication-results: postout.lrz.de (amavis); dkim=pass (2048-bit key) reason="pass (just generated, assumed good)" header.d=tum.de
  • Autocrypt: addr=manuel.andreas@xxxxxx; keydata= xjMEY9Zx/RYJKwYBBAHaRw8BAQdALWzRzW9a74DX4l6i8VzXGvv72Vz0qfvj9s7bjBD905nN Jk1hbnVlbCBBbmRyZWFzIDxtYW51ZWwuYW5kcmVhc0B0dW0uZGU+wokEExYIADEWIQQuSfNX 11QV6exAUmOqZGwY4LuingUCY9Zx/QIbAwQLCQgHBRUICQoLBRYCAwEAAAoJEKpkbBjgu6Ke McQBAPyP530S365I50I5rM2XjH5Hr9YcUQATD5dusZJMDgejAP9T/wUurwQSuRfm1rK8cNcf w4wP3+PLvL+J+kuVku93CM44BGPWcf0SCisGAQQBl1UBBQEBB0AmCAf31tLBD5tvtdZ0XX1B yGLUAxhgmFskGyPhY8wOKQMBCAfCeAQYFggAIBYhBC5J81fXVBXp7EBSY6pkbBjgu6KeBQJj 1nH9AhsMAAoJEKpkbBjgu6Kej6YA/RvJdXMjsD5csifolLw53KX0/ElM22SvaGym1+KiiVND AQDy+y+bCXI+J713/AwLBsDxTEXmP7Cp49ZqbAu83NnpBQ==
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 02 Jun 2025 14:52:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 6/2/25 4:12 PM, Jan Beulich wrote:

On 02.06.2025 15:39, Manuel Andreas wrote:
I've discovered an issue in the nested VMX implementation, where an 
unprivileged domain is able to force Xen to dereference a NULL pointer, 
resulting in a panic.
Sadly you provide no details on this NULL deref.
Here's the respective dump:

----[ Xen-4.20.0  x86_64  debug=y  Tainted:     H  ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d0402ae2b8>] nvmx_handle_vmx_insn+0x7ab/0xccb
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d1v0)
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 8000000000000002
(XEN) rdx: 0000000000000000   rsi: 01ffffffffffffff   rdi: ffff82e0020155e0
(XEN) rbp: ffff830179407e68   rsp: ffff830179407e00   r8:  ffff82c00023b000
(XEN) r9:  ffff830179413c40   r10: 0000000000000000   r11: 0000000000000200
(XEN) r12: ffff83010483d000  r13: ffff830179407ef8   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 000000010498f000   cr2: 0000000000000000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d0402ae2b8> (nvmx_handle_vmx_insn+0x7ab/0xccb):
(XEN)  75 b0 0f 86 12 05 00 00 <81> 08 00 00 00 80 41 8b 84 24 f4 05 00 00 80 cc
(XEN) Xen stack trace from rsp=ffff830179407e00:
(XEN)    ffff83010483d000 000000000011e000 0000000000000000 0000000000000000
(XEN)    ffff82d0402bfc4a 0000000100000000 0000000000119fa8 ffff82d000000008
(XEN)    ffff830100000006 ffff830179407ef8 0000000000000015 ffff83010483d000
(XEN)    0000000000000000 ffff830179407ee8 ffff82d0402a9a19 ffff82d04020361b
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff830100997000
(XEN)    ffff82d040203615 ffff82d04020361b ffff82d040203615 ffff82d04020361b
(XEN)    ffff83010483d000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 00007cfe86bf80e7 ffff82d040203673 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000001
(XEN)    0000000000000000 0000000000000000 0000000000000043 000000000000007b
(XEN)    0000000000000043 0000000000000000 0000000000000000 0000000011e57e00
(XEN)    0000000000000000 0000000000000000 0000beef0000beef 0000000000103fa2
(XEN)    000000bf0000beef 0000000000000046 0000000000119fa0 000000000000beef
(XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef
(XEN)    0000e01000000001 ffff83010483d000 0000003136627000 00000000003526e0
(XEN)    0000000000000000 0000000000000000 0000000300000001 0000004e00000003
(XEN) Xen call trace:
(XEN)    [<ffff82d0402ae2b8>] R nvmx_handle_vmx_insn+0x7ab/0xccb
(XEN)    [<ffff82d0402a9a19>] F vmx_vmexit_handler+0xd97/0x1e14
(XEN)    [<ffff82d040203673>] F vmx_asm_vmexit_handler+0x103/0x220
(XEN)
(XEN) Pagetable walk from 0000000000000000:

(XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff

Where nvmx_handle_vmx_insn+0x7ab/0xccb resolves to xen/arch/x86/hvm/vmx/vvmx.c:1169
Specifically, in nvmx_handle_vmptrld we have:

1830    if ( cpu_has_vmx_vmcs_shadowing )
1831        nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx);

Which is reachable with nvcpu->nv_vvmcx untouched in the case of the p2m mapping for gpa not being writable (see lines 1793 - 1827) .
Therefore, nvcpu->nv_vvmcx will remain NULL (as initialized in nvmx_vcpu_initialize).
Further, inside nvmx_set_vmcs_pointer:

1164    static void nvmx_set_vmcs_pointer(struct vcpu *v, struct vmcs_struct *vvmcs)
1165    {
1166        paddr_t vvmcs_maddr = v->arch.hvm.vmx.vmcs_shadow_maddr;
1167
1168        __vmpclear(vvmcs_maddr);
1169        vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK;
1170        /* ... */

will then dereference vvmcs, which is NULL.


      
This is possible when:

 1. The malicious domain has nested HVM capabilities.
 2. The CPU is running on top of VMX and supports shadow VMCS.

To trigger the bug, the domain must first enable VMX operation for 
itself, execute VMXON and then finally execute VMPTRLD on a guest 
physical address that is backed by a non-writable p2m mapping.
In `nvmx_handle_vmptrld`, after attempting to map the nested VMCS, Xen 
will check whether or not this mapping is suitable for writing and if 
not immediately unmap the nested VMCS again and abort the setup of 
`nvcpu->nv_vvmcx`. However, Xen at this point erroneously continues 
emulation of the VMPTRLD. In particular, if VMCS shadowing is available, 
Xen will nonetheless attempt to link up the nested VMCS to its own VMCS 
in `nvmx_set_vmcs_pointer`. Importantly, Xen here attempts to 
dereference the presumably mapped nested VMCS (which now is merely a 
NULL pointer) in order to mark it as a shadow VMCS by applying the 
`VMCS_RID_TYPE_MASK` to its revision identifier. Following, the page 
fault handler will panic Xen.

I've attached an XTF reproducer that triggers the bug. To setup such a 
non-writable p2m mapping for the malicious VMCS, I first setup an 
appropriate grant table entry. I've tested it on Xen version 4.20.0.
I expect this to not work anymore on current staging or 4.20.1-pre.
See a8325f981ce4 ("x86/P2M: synchronize fast and slow paths of
p2m_get_page_from_gfn()").
On first glance I don't see how that would impact the type of the established p2m mapping.
If you want I can compile staging and test if the PoC does not work there anymore.

To fix the issue I believe the following patch should be suitable:

--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1817,7 +1817,9 @@ static int nvmx_handle_vmptrld(struct 
cpu_user_regs *regs)
              else
              {
                  hvm_unmap_guest_frame(vvmcx, 1);
-                vvmcx = NULL;
+                vmfail(regs, VMX_INSN_VMPTRLD_INVALID_PHYADDR);
+
+                return X86EMUL_OKAY;
              }
          }
          else

The VMX error AFAICT does not strictly adhere to the Intel SDM, but 
providing the guest some indication on what went wrong is likely more 
sensible than silently failing.
Giving the guest some indication is certainly right. If we want to follow
the above route, I think the change would want doing a little differently,
to take the path that presently is the "else" at the bottom of the hunk
above. However, I can't presently see how invoking vmfail() would make a
difference as to the subsequent NULL deref: The guest could continue the
same irrespective of the failure. Hence why I'd like to understand what
NULL deref you did observe. (We may hence need two patches - one along
the above lines, and another one dealing with the NULL issue.)

The issue is really just in the latter part of nvmx_handle_vmptrld, which attempts to initialize its shadow VMCS even though establishing a mapping for the nested VMCS failed.
An early exit from that function (as my patch suggests) should be sufficient for that case.
Other parts of the nested VMX code already check for a valid nested VMCS via vvmcx_valid. There simply isn't such a check here presumably because the assumption was that any previous errors in mapping the nested VMCS were already handled by returning from the function or jumping to the out label.

Best,
Manuel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.