[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] x86/svm: Write the correct %eip into the outgoing task


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 22 Nov 2019 14:59:39 +0100
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=roger.pau@xxxxxxxxxx; spf=Pass smtp.mailfrom=roger.pau@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx
  • Cc: Juergen Gross <jgross@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
  • Delivery-date: Fri, 22 Nov 2019 14:00:00 +0000
  • Ironport-sdr: ce67d0fmvHWGuGz9H4PvlRgIVeH2yxlLvz3cr9cRz7PNGPkSkUs50L39786+r8H43eT9EFNKiC AGsIqqlnbhNhH59dZ0EoJCi0SHRZSVrkyMlLuR9KrPtCYeZUIexf5Bx3YL+yHZdKES/3gO9THU vXSMOEFzqA5DHOf3xatLG2jvv4mLz66BBvJtMbh9PS1NIkZNsK1hxfAj+Ei21Dj00Kv+VVgEwH 9IjVIDuo8Fi/7oIzDpGUpBma+CXke5nSKrK5Ymg3oli36ZA5kt4GBqnaWJbecNIj0HTPE6RUMW Zt0=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Nov 21, 2019 at 10:15:51PM +0000, Andrew Cooper wrote:
> The TASK_SWITCH vmexit has fault semantics, and doesn't provide any NRIPs
> assistance with instruction length.  As a result, any instruction-induced task
> switch has the outgoing task's %eip pointing at the instruction switch caused
                                                                  ^ that
> the switch, rather than after it.
> 
> This causes explicit use of task gates to livelock (as when the task returns,
> it executes the task-switching instruction again), and any restartable task to
> become a nop after its first instantiation (the entry state points at the
> ret/iret instruction used to exit the task).
> 
> 32bit Windows in particular is known to use task gates for NMI handling, and
> to use NMI IPIs.
> 
> In the task switch handler, distinguish instruction-induced from
> interrupt/exception-induced task switches, and decode the instruction under
> %rip to calculate its length.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Wei Liu <wl@xxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> CC: Juergen Gross <jgross@xxxxxxxx>
> 
> The implementation of svm_get_task_switch_insn_len() is bug-compatible with
> svm_get_insn_len() when it comes to conditional #GP'ing.  I still haven't had
> time to address this more thoroughly.
> 
> AMD does permit TASK_SWITCH not to be intercepted and, I'm informed does do
> the right thing when it comes to a TSS crossing a page boundary.  However, it
> is not actually safe to leave task switches unintercepted.  Any NPT or shadow
> page fault, even from logdirty/paging/etc will corrupt guest state in an
> unrecoverable manner.
> ---
>  xen/arch/x86/hvm/svm/emulate.c        | 55 
> +++++++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/svm/svm.c            | 46 ++++++++++++++++++++++-------
>  xen/include/asm-x86/hvm/svm/emulate.h |  1 +
>  3 files changed, 92 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
> index 3e52592847..176c25f60d 100644
> --- a/xen/arch/x86/hvm/svm/emulate.c
> +++ b/xen/arch/x86/hvm/svm/emulate.c
> @@ -117,6 +117,61 @@ unsigned int svm_get_insn_len(struct vcpu *v, unsigned 
> int instr_enc)
>  }
>  
>  /*
> + * TASK_SWITCH vmexits never provide an instruction length.  We must always
> + * decode under %rip to find the answer.
> + */
> +unsigned int svm_get_task_switch_insn_len(struct vcpu *v)
> +{
> +    struct hvm_emulate_ctxt ctxt;
> +    struct x86_emulate_state *state;
> +    unsigned int emul_len, modrm_reg;
> +
> +    ASSERT(v == current);
> +    hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
> +    hvm_emulate_init_per_insn(&ctxt, NULL, 0);
> +    state = x86_decode_insn(&ctxt.ctxt, hvmemul_insn_fetch);
> +    if ( IS_ERR_OR_NULL(state) )

Maybe crash the guest in this case? Not advancing the instruction
pointer in a software induced task switch will create a loop AFAICT?

> +        return 0;
> +
> +    emul_len = x86_insn_length(state, &ctxt.ctxt);
> +
> +    /*
> +     * Check for an instruction which can cause a task switch.  Any far
> +     * jmp/call/ret, any software interrupt/exception, and iret.
> +     */
> +    switch ( ctxt.ctxt.opcode )
> +    {
> +    case 0xff: /* Grp 5 */
> +        /* call / jmp (far, absolute indirect) */
> +        if ( x86_insn_modrm(state, NULL, &modrm_reg) != 3 ||
> +             (modrm_reg != 3 && modrm_reg != 5) )
> +        {
> +            /* Wrong instruction.  Throw #GP back for now. */
> +    default:
> +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +            emul_len = 0;
> +            break;
> +        }
> +        /* Fallthrough */
> +    case 0x62: /* bound */
> +    case 0x9a: /* call (far, absolute) */

I'm slightly loss here, in the case of call or jmp for example, don't
you need the instruction pointer to point to the destination of the
call/jmp instead of the next instruction?

> +    case 0xca: /* ret imm16 (far) */
> +    case 0xcb: /* ret (far) */
> +    case 0xcc: /* int3 */
> +    case 0xcd: /* int imm8 */
> +    case 0xce: /* into */
> +    case 0xcf: /* iret */
> +    case 0xea: /* jmp (far, absolute) */
> +    case 0xf1: /* icebp */
> +        break;
> +    }
> +
> +    x86_emulate_free_state(state);
> +
> +    return emul_len;
> +}
> +
> +/*
>   * Local variables:
>   * mode: C
>   * c-file-style: "BSD"
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 049b800e20..ba9c24a70c 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2776,7 +2776,41 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>  
>      case VMEXIT_TASK_SWITCH: {
>          enum hvm_task_switch_reason reason;
> -        int32_t errcode = -1;
> +        int32_t errcode = -1, insn_len = -1;

Plain int seem better for insn_len?

Also I'm not sure there's a reason that errcode uses int32_t, but
that's not introduced here anyway.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.