[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

R: [BUG] Possible FATAL PAGE FAULT in domain_crash function on Xen 4.14.6


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
  • From: Giuseppe De Rosa <giuseppe.de.rosa@xxxxxxxxxx>
  • Date: Tue, 6 Feb 2024 16:17:42 +0000
  • Accept-language: it-IT, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7kycROZqAaTkyG+kYa3gPHkmWsxhIihlVtcW94aOH6c=; b=B4wyJzsN5a4gBCJqdujznaiZhOEIHUqJJcPGrHZGdLk5oTeZnqywZNk3X/mpYNOaedcHqagGAa/XzlxrJSWuUFya41sgtkFJkqMPEtD1tMNYWsdpZziYYsTGjqKLmihjieJafuAh6aGOkZYeqNHrhAE17W5n/Xupol3doGZlEV/f081O4NefQxa6q8AitZQ5adj1gB6C2NlvuUAOvqId1AgVJ/zV2xIk6vCKoSSulERZ+FQradEgSt5icY77nVugIuCjLPs/UqcXcc9czmXvjQEr9TJcX06jEIvwlJXY76zb2iNOM/vwcFb2Ghs/m6bc/0FZ6p3haxhL/V8wsJpnaw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hCUtJHrn8hzxkeTzsYNGuOLAlwOpQBIyyDdUFka9i8yFh41GSvI03yUm1AQr1Ik11Dnm22D3sEPT/Q23d3X37RX6QZxf3T8RjY1bV39Rd/u/xHYUFPS4iUOFu6DhEYJ9kqqCa0+rGTm7cCjHwkCTWNWraY6Vx9u5VM8WwV99CkGRyQ017nWC19Cn6bxvssh7DtpDoV+GV68dInJnXrj9HHrQDPPaSpZgPNvSMMCTj4P4n0mGPEMFtASc94cif2AjT/3r30kPMW9jwN4PhjLW0gOK5nG0exAVuPw9ymmtbqZJktDVEtT2VdHeZ3g5y+D6CPKOgDEGL/5k7npYs6iGMw==
  • Delivery-date: Tue, 06 Feb 2024 16:18:04 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Msip_labels:
  • Thread-index: AQHaWPFxTFnf2PP9dU2mKqDyY7lMybD9WsMAgAAheyA=
  • Thread-topic: [BUG] Possible FATAL PAGE FAULT in domain_crash function on Xen 4.14.6

Hello Andrew,

Thank you for the quick response. I have no local modifications to Xen, precisely in order to test it as cleanly as possible.
As soon as possible, I'll try to use a more recent version. Regarding the stack trace, I believe the series of function calls starts from:

vmx_asm_vmexit_handler -> vmx_vmexit_handler -> domain_crash -> __domain_crash

Thank you for addressing the concerns about security, next time I'll follow your advice.

Best regards,
Giuseppe De Rosa.

Da: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Inviato: martedì 6 febbraio 2024 15:11
A: Giuseppe De Rosa <giuseppe.de.rosa@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxx <xen-devel@xxxxxxxxxxxxx>
Oggetto: Re: [BUG] Possible FATAL PAGE FAULT in domain_crash function on Xen 4.14.6
 
On 06/02/2024 12:14 pm, Giuseppe De Rosa wrote:
> Bug detailed description:
>
> ----------------
>
> While booting a Linux Debian 7 "Wheezy" VM, Xen crashes with a FATAL
> PAGE FAULT. 
>
>  
>
> Environment :
>
> ----------------
>
> HW: Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz (2 CPU, Xen in nested
> virtualization upon QEMU/KVM), 4GB RAM
>
> Xen: Xen 4.14.6 (xen-hypervisor-4.14-amd64 package)
>
> Dom0: Linux 5.10.0-26-amd64 (Debian 11 "Bullseye"), 1 CPU, 1024GB RAM
>
>  
>
> Reproduce steps:
>
> ----------------
>
> 1.      Install debian 11 and xen package
>
> 2.      Boot a clean debian 7 image in hardware-assisted virtualization
>
> 3.      Single bit-flip bit number 2 of VMCS field
> "VM_EXIT_QUALIFICATION" (field number 6400). In my case, value changed
> from [100049] -> [10004d]
>
> 4.      Leave the debian7 guest executing after the bit flip.
>
>  
>
> Current result:
>
> ----------------
>
> Xen crash
>
>  
>
> Console error log:
>
> ----------------
>
> (XEN) d1v0 Unexpected PIO status 1, port 0x10 read 0x00000000ffff
>
> (XEN) domain_crash called from io.c:166
> (XEN) Domain 1 (vcpu#0) crashed on cpu#1:
> (XEN) ----[ Xen-4.14.6  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    1
> (XEN) RIP:    0010:[<ffffffff8100712e>]
> (XEN) RFLAGS: 0000000000000046   CONTEXT: hvm guest (d1v0)
> (XEN) rax: 0000000000000000   rbx: 0000000040000000   rcx: 0000000000000001
> (XEN) rdx: 0000000000000000   rsi: ffffffff81666a80   rdi: ffffffff81617830
> (XEN) rbp: 0000000000000020   rsp: ffffffff81601e78   r8:  0000000000000200
> (XEN) r9:  ffffffff8168f2a0   r10: 0000000000000007   r11: 0000000000000007
> (XEN) r12: ffffffff81601f58   r13: ffffffffffffffff   r14: 000000000008c800
> (XEN) r15: 0000000000001000   cr0: 0000000080050033   cr4: 00000000001000a0
> (XEN) cr3: 0000000001605000   cr2: 0000000000000000
> (XEN) fsb: 0000000000000000   gsb: ffffffff81696000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: 0010
> (XEN) d1v0 Bad rIP ffffffff8100712e for mode 0
> (XEN) domain_crash called from vmx.c:4413
> (XEN) ----[ Xen-4.14.6  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    1
> (XEN) RIP:    e008:[<ffff82d040206fa9>] __domain_crash+0x9/0x80
> (XEN) RFLAGS: 0000000000010296   CONTEXT: hypervisor (d1v0)
> (XEN) rax: ffff830139c0506c   rbx: ffff8301308a0000   rcx: 0000000000000000
> (XEN) rdx: ffff830136ddffff   rsi: 000000000000000a   rdi: 0000000000000000
> (XEN) rbp: 0000000000000000   rsp: ffff830136ddfee0   r8:  0000000000000001
> (XEN) r9:  0000000000004000   r10: 0000000000000001   r11: ffff82d040372d40
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000001526e0
> (XEN) cr3: 0000000136da6000   cr2: 0000000000000208
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d040206fa9> (__domain_crash+0x9/0x80):
> (XEN)  0f 1e fa 55 48 89 fd 53 <80> bf 08 02 00 00 00 75 2d 48 89 e3 0f
> b7 37 48
> (XEN) Xen stack trace from rsp=ffff830136ddfee0:
> (XEN)    ffff8301308a0000 0000000000000000 ffff82d0402a1798 0000000000001000
> (XEN)    000000000008c800 ffffffffffffffff ffffffff81601f58 0000000000000020
> (XEN)    0000000040000000 0000000000000007 0000000000000007 ffffffff8168f2a0
> (XEN)    0000000000000200 0000000000000000 0000000000000001 0000000000000000
> (XEN)    ffffffff81666a80 ffffffff81617830 000000fa00000000 ffffffff8100712e
> (XEN)    0000000000000000 0000000000000046 ffffffff81601e78 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000e01000000001 ffff8301308a0000 00000030f9686000 00000000001526e0
> (XEN)    0000000000000000 0000000000000000 0006020200000000 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d040206fa9>] R __domain_crash+0x9/0x80
> (XEN)    [<ffff82d0402a1798>] S vmx_asm_vmexit_handler+0xf8/0x210
> (XEN)
> (XEN) Pagetable walk from 0000000000000208:
> (XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000208
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
>  
>
> HVM guest config file:
>
> --------------------------------
>
> name = "debian7"
>
> builder= "hvm"
> memory = 1024
> vcpus = 1
> cpus= ["1"]
> disk=["qcow2:/home/test/debian7.qcow2,xvda,w"]
> vnc = 1
> vnclisten = '127.0.0.1'
> vncdisplay = 1
> boot = "d"
> serial = "pty"
>
> Discussion
> --------------------------------
>
> Hello, i am conducting robustness tests on Xen 4.14. I bit flipped the
> field VM_EXIT_QUALIFICATION during the exit handling of an
> IO_INSTRUCTION. After a VM Read on that field, Xen crashes with this
> error log. This is my (possible) explanation:
>
> Xen correctly detects an error in the field, resulting in a domain crash
> (Unexpected PIO status 1) called from this point in the code:
>
> xen/arch/x86/hvm/io.c
> ```
> gprintk(XENLOG_ERR, "Unexpected PIO status %d, port %#x %s 0x%0*x\n",
>                 rc, port, dir == IOREQ_WRITE ? "write" : "read",
>                 size * 2, data & ((1u << (size * 8)) - 1));
> domain_crash(curr->domain);
> return false;
> ```
>
> In the `handle_pio` function, in the handling of the `IO_INSTRUCTION`
> exit reason. However, the code continues and ends here for an issue in
> the processor mode:
>
> /xen/arch/x86/hvm/vmx/vmx.c
> ```
> mode = vmx_guest_x86_mode(v);
>     if ( mode == 8 ? !is_canonical_address(regs->rip)
>                    : regs->rip != regs->eip )
>     {
>         gprintk(XENLOG_WARNING, "Bad rIP %lx for mode %u\n", regs->rip,
> mode);
>
>         if ( vmx_get_cpl() )
>         {
>             __vmread(VM_ENTRY_INTR_INFO, &intr_info);
>             if ( !(intr_info & INTR_INFO_VALID_MASK) )
>                 hvm_inject_hw_exception(TRAP_gp_fault, 0);
>             /* Need to fix rIP nevertheless. */
>             if ( mode == 8 )
>                 regs->rip = (long)(regs->rip << (64 - VADDR_BITS)) >>
>                             (64 - VADDR_BITS);
>             else
>                 regs->rip = regs->eip;
>         }
>         else
>             domain_crash(v->domain);
>     }
> ```
>
> However, the domain pointer has already been deallocated due to the
> previous domain crash, resulting in a page fault that leads to Xen crashing.
>
> I would like to report this crash and ask for your opinion. Based on
> previous research, I do not believe it has been reported before. I am
> unsure if it could potentially be a security issue, hence I am posting
> it here. Let me know if I should provide other results. Thank you for
> your kind response in advance.


Answering somewhat out of order.

You've posted this publicly so the cat is out of the bag regardless.  If
you have concerns about security, please email security@xxxxxxx as your
first point of contact.

That said, nested virt is not security supported.  It's still an
experimental feature, so bugs like this are fine to come straight to the
public mailing list.

The "Bad rIP" logic is buggy and has since been deleted.  Xen 4.14 is a
very old version of Xen, and is outside of general bugfix support.

It is quite likely that this bug still exists, but please use an
up-to-date version of Xen.  Fixes need developing against master and are
unlikely to be backported in this case, given its experimental status.

Also, you should be using a debug build of Xen generally for work like
this.  (I have no idea if it would alter your observations.)


I'm not sure what I think about bitflipping the exit qualification.  In
other places, that will definitely cause more severe crashes, and Xen
won't be getting in the game of auditing the VMX implementation against
the VMX spec.


For this crash you've got, there should be no way for the domain pointer
to have been freed in the sequence you've described.  While the vCPU is
still scheduled, the structures will remain.

Looking at the second backtrace, It looks suspiciously like a NULL
pointer was passed into __domain_crash() from vmx_asm_vmexit_handler()
but there's no such call call, nor a tailcall out of the handler, even
taking a peek at the 4.14 code.

If you have local changes, I'd look at those first.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.