[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] x86/pv: Inject #UD for missing SYSCALL callbacks



On 23.09.2020 12:18, Andrew Cooper wrote:
> Despite appearing to be a deliberate design choice of early PV64, the
> resulting behaviour for unregistered SYSCALL callbacks creates an untenable
> testability problem for Xen.  Furthermore, the behaviour is undocumented,
> bizarre, and inconsistent with related behaviour in Xen, and very liable
> introduce a security vulnerability into a PV guest if the author hasn't
> studied Xen's assembly code in detail.
> 
> There are two different bugs here.
> 
> 1) The current logic confuses the registered entrypoints, and may deliver a
>    SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
>    entrypoint is registered.
> 
>    This has been the case ever since 2007 (c/s cd75d47348b) but up until
>    2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
>    a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.

I'm not sure what you derive the last half sentence from. To a 32-bit
PV guest, nothing can make things look like being 64-bit. And as you
did say in your 2018 change, FLAT_KERNEL_SS == FLAT_USER_SS32.

As to the "confusion" of entry points - before the compat mode entry
path was introduced, a 64-bit guest could only register a single
entry point. Hence guests at the time had to multiplex 32- and 64-bit
user mode entry from this one code path. In order to avoid regressing
any such guest, the falling back to using the 64-bit entry point was
chosen. Effectively what you propose is to regress such guests now,
rather than back then.

>    Xen would malfunction under these circumstances, if it were a PV guest.
>    Linux would as well, but PVOps has always registered both entrypoints and
>    discarded the Xen-provided selectors.  NetBSD really does malfunction as a
>    consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).
> 
> 2) In the case that neither SYSCALL callbacks are registered, the guest will
>    be crashed when userspace executes a SYSCALL instruction, which is a
>    userspace => kernel DoS.
> 
>    This has been the case ever since the introduction of 64bit PV support, but
>    behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield
>    #GP/#UD in userspace before the callback is registered, and are therefore
>    safe by default.

I agree this part is an improvement.

> This change does constitute a change in the PV ABI, for corner cases of a PV
> guest kernel registering neither callback, or not registering the 32bit
> callback when running on AMD/Hygon hardware.
> 
> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
> SYSENTER (safe by default, until explicitly enabled), as well as native
> hardware (always delivered to the single applicable callback).

Albeit an OS running natively and setting EFER.SCE is obliged to set both
entry points; they can't have one without the other (and not be vulnerable).
Since it's unclear what the PV equivalent of EFER.SCE is, I don't think
comparing this particular aspect of the behavior makes a lot of sense.

> Most importantly however, and the primary reason for the change, is that it
> lets us actually test the PV entrypoints to prove correct behaviour.

You mean "test the absence of PV entry points" here?

> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -26,18 +26,30 @@
>  /* %rbx: struct vcpu */
>  ENTRY(switch_to_kernel)
>          leaq  VCPU_trap_bounce(%rbx),%rdx
> -        /* TB_eip = (32-bit syscall && syscall32_addr) ?
> -         *          syscall32_addr : syscall_addr */
> -        xor   %eax,%eax
> +
> +        /* TB_eip = 32-bit syscall ? syscall32_addr : syscall_addr */
> +        mov   VCPU_syscall32_addr(%rbx), %ecx
> +        mov   VCPU_syscall_addr(%rbx), %rax
>          cmpw  $FLAT_USER_CS32,UREGS_cs(%rsp)
> -        cmoveq VCPU_syscall32_addr(%rbx),%rax
> -        testq %rax,%rax
> -        cmovzq VCPU_syscall_addr(%rbx),%rax
> -        movq  %rax,TRAPBOUNCE_eip(%rdx)
> +        cmove %rcx, %rax
> +
>          /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
>          btl   $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
>          setc  %cl
>          leal  (,%rcx,TBF_INTERRUPT),%ecx
> +
> +        test  %rax, %rax
> +UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
> +        movq  VCPU_trap_ctxt(%rbx), %rdi
> +        movl  $X86_EXC_UD, UREGS_entry_vector(%rsp)
> +        subl  $2, UREGS_rip(%rsp)
> +        movl  X86_EXC_UD * TRAPINFO_sizeof + TRAPINFO_eip(%rdi), %eax

I guess you mean "movq ..., %rax"? Iirc 32-bit guests don't even get through
here.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.