[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Friday, March 4, 2016 7:28 PM
> To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Wu, Feng
> <feng.wu@xxxxxxxxx>; Keir Fraser <keir@xxxxxxx>
> Subject: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV
> guest code
> 
> Since such guests' kernel code runs in ring 1, their memory accesses,
> at the paging layer, are supervisor mode ones, and hence subject to
> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
> two features though (and so far we also don't expose the respective
> feature flags), and hence may suffer page faults they cannot deal with.
> 
> While the placement of the re-enabling slightly weakens the intended
> protection, it was selected such that 64-bit paths would remain
> unaffected where possible. At the expense of a further performance hit
> the re-enabling could be put right next to the CLACs.
> 
> Note that this introduces a number of extra TLB flushes - CR4.SMEP
> transitioning from 0 to 1 always causes a flush, and it transitioning
> from 1 to 0 may also do.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
>  static bool_t __initdata opt_smap = 1;
>  boolean_param("smap", opt_smap);
> 
> +unsigned long __read_mostly cr4_smep_smap_mask;
> +
>  /* Boot dom0 in pvh mode */
>  static bool_t __initdata opt_dom0pvh;
>  boolean_param("dom0pvh", opt_dom0pvh);
> @@ -1335,6 +1337,8 @@ void __init noreturn __start_xen(unsigne
>      if ( cpu_has_smap )
>          set_in_cr4(X86_CR4_SMAP);
> 
> +    cr4_smep_smap_mask = mmu_cr4_features & (X86_CR4_SMEP |
> X86_CR4_SMAP);
> +
>      if ( cpu_has_fsgsbase )
>          set_in_cr4(X86_CR4_FSGSBASE);
> 
> @@ -1471,7 +1475,10 @@ void __init noreturn __start_xen(unsigne
>       * copy_from_user().
>       */
>      if ( cpu_has_smap )
> +    {
> +        cr4_smep_smap_mask &= ~X86_CR4_SMAP;

You change ' cr4_smep_smap_mask ' here ...

>          write_cr4(read_cr4() & ~X86_CR4_SMAP);
> +    }
> 
>      printk("%sNX (Execute Disable) protection %sactive\n",
>             cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
> @@ -1488,7 +1495,10 @@ void __init noreturn __start_xen(unsigne
>          panic("Could not set up DOM0 guest OS");
> 
>      if ( cpu_has_smap )
> +    {
>          write_cr4(read_cr4() | X86_CR4_SMAP);
> +        cr4_smep_smap_mask |= X86_CR4_SMAP;

... and here. I am wonder whether it is because Domain 0 can actually start
running between the two place? Or I don't think the changes in the above
two places is needed. right?
> 
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -434,6 +434,7 @@ ENTRY(dom_crash_sync_extable)
> 
>  ENTRY(common_interrupt)
>          SAVE_ALL CLAC
> +        SMEP_SMAP_RESTORE
>          movq %rsp,%rdi
>          callq do_IRQ
>          jmp ret_from_intr
> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>  GLOBAL(handle_exception)
>          SAVE_ALL CLAC
>  handle_exception_saved:
> +        GET_CURRENT(%rbx)
>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>          jz    exception_with_ints_disabled
> -        sti
> +
> +.Lsmep_smap_orig:
> +        jmp   0f
> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
> +        .else
> +        // worst case: rex + opcode + modrm + 4-byte displacement
> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
> +        .endif
> +        .pushsection .altinstr_replacement, "ax"
> +.Lsmep_smap_alt:
> +        mov   VCPU_domain(%rbx),%rax
> +.Lsmep_smap_alt_end:
> +        .section .altinstructions, "a"
> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
> +                             X86_FEATURE_SMEP, \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
> +                             X86_FEATURE_SMAP, \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
> +        .popsection
> +
> +        testb $3,UREGS_cs(%rsp)
> +        jz    0f
> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
> +        je    0f
> +        call  cr4_smep_smap_restore
> +        /*
> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in

Do you mean "before" when you typed "between" above?

> +         * compat_restore_all_guest and it actually returning to guest
> +         * context, in which case the guest would run with the two features
> +         * enabled. The only bad that can happen from this is a kernel mode
> +         * #PF which the guest doesn't expect. Rather than trying to make the
> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
> +         * back to the guest (which will in turn clear the two CR4 bits) to
> +         * re-execute the instruction. If we get back here, the CR4 bits
> +         * should then be found clear (unless another NMI/#MC occurred at
> +         * exactly the right time), and we'll continue processing the
> +         * exception as normal.
> +         */

As Andrew mentioned in another mail, this scenario is a little tricky, could you
please give a more detailed description about how the MNI/#MC affect the
execution flow, maybe with some code in the explanation? Thanks a lot!

Thanks,
Feng


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.