[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks


  • To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Wed, 28 Oct 2020 21:31:33 +0000
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Wed, 28 Oct 2020 21:31:53 +0000
  • Ironport-sdr: rntssEikuZwN3+9nk0j2xv3y/RbMhIv7b3LnGgSSZjJxkGg26dM0/lnzoppw2dHC8c7R1Zc2cc x/9A75bH1n20cooh7ZG1rXe71GiTlatBkVi8J+qMbXJENXAFoF1w/jBJWS0OjElKORCPyIfUsZ KrbZ0E5j5LttLC5vXeHVBVlD47dFE6S4kW+8llXnXUEO2cjgjNZstck5c8Us55iASDgvEvByCT Xm+nWelSZsSSpUO5L6SfcBRE7tQBiAqseBVR8+5R9gelh67Tx6q6ZB5d143IUfkN0E4/iZ0gn6 M7M=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 26/10/2020 09:40, Jan Beulich wrote:
> In the case that no 64-bit SYSCALL callback is registered, the guest
> will be crashed when 64-bit userspace executes a SYSCALL instruction,
> which would be a userspace => kernel DoS.  Similarly for 32-bit
> userspace when no 32-bit SYSCALL callback was registered either.
>
> This has been the case ever since the introduction of 64bit PV support,
> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
> yield #GP/#UD in userspace before the callback is registered, and are
> therefore safe by default.
>
> This change does constitute a change in the PV ABI, for the corner case
> of a PV guest kernel not registering a 64-bit callback (which has to be
> considered a defacto requirement of the unwritten PV ABI, considering
> there is no PV equivalent of EFER.SCE).
>
> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
> SYSENTER (safe by default, until explicitly enabled).
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Signed-off-by: Jan Beulich <JBeulich@xxxxxxxx>
> ---
> v3:
>  * Split this change off of "x86/pv: Inject #UD for missing SYSCALL
>    callbacks", to allow the uncontroversial part of that change to go
>    in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching
>    over just the REX prefix too much trickery even for an unlikely to be
>    taken code path?)

I find this submission confusion seeing as my v3 is already suitably
acked and ready to commit.  What I haven't had yet enough free time to
do so.

> v2:
>  * Drop unnecessary instruction suffixes
>  * Don't truncate #UD entrypoint to 32 bits
>
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -33,11 +33,27 @@ ENTRY(switch_to_kernel)
>          cmoveq VCPU_syscall32_addr(%rbx),%rax
>          testq %rax,%rax
>          cmovzq VCPU_syscall_addr(%rbx),%rax
> -        movq  %rax,TRAPBOUNCE_eip(%rdx)
>          /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
>          btl   $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
>          setc  %cl
>          leal  (,%rcx,TBF_INTERRUPT),%ecx
> +
> +        test  %rax, %rax
> +UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
> +        mov   VCPU_trap_ctxt(%rbx), %rdi
> +        movl  $X86_EXC_UD, UREGS_entry_vector(%rsp)
> +        cmpw  $FLAT_USER_CS32, UREGS_cs(%rsp)
> +        je    0f
> +        rex64                           # subl => subq
> +0:
> +        subl  $2, UREGS_rip(%rsp)

There was deliberately not a 32bit sub here (see below).

As to the construct, I'm having a hard time deciding whether this is an
excellent idea, or far too clever for its own good.

Some basic perf testing shows that there is a visible difference in
execution time of these two paths, which means there is some
optimisation being missed in the frontend for the 32bit case.  However,
the difference is tiny in the grand scheme of things (about 0.4%
difference in aggregate time to execute a loop of this pattern with a
fixed number of iterations.)

However, the 32bit case isn't actually interesting here.  A guest can't
execute a SYSCALL instruction on/across the 4G->0 boundary because the
M2P is mapped NX up to the 4G boundary, so we can never reach this point
with %eip < 2.

Therefore, the 64bit-only form is the appropriate one to use, which
solves any question of cleverness, or potential decode stalls it causes.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.