[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code



>>> On 07.03.16 at 17:59, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 04/03/16 11:27, Jan Beulich wrote:
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -67,6 +67,8 @@ boolean_param("smep", opt_smep);
>>  static bool_t __initdata opt_smap = 1;
>>  boolean_param("smap", opt_smap);
>>  
>> +unsigned long __read_mostly cr4_smep_smap_mask;
> 
> Are we liable to gain any other cr4 features which would want to be
> included in this?  Might it be wise to chose a slightly more generic
> name such as cr4_pv32_mask ?

Ah, that's a good name suggestion - I didn't like the "smep_smap"
thing from the beginning, but couldn't think of a better variant.

>> @@ -174,10 +174,43 @@ compat_bad_hypercall:
>>  /* %rbx: struct vcpu, interrupts disabled */
>>  ENTRY(compat_restore_all_guest)
>>          ASSERT_INTERRUPTS_DISABLED
>> +.Lcr4_orig:
>> +        ASM_NOP3 /* mov   %cr4, %rax */
>> +        ASM_NOP6 /* and   $..., %rax */
>> +        ASM_NOP3 /* mov   %rax, %cr4 */
>> +        .pushsection .altinstr_replacement, "ax"
>> +.Lcr4_alt:
>> +        mov   %cr4, %rax
>> +        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
>> +        mov   %rax, %cr4
>> +.Lcr4_alt_end:
>> +        .section .altinstructions, "a"
>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
>> +                             (.Lcr4_alt_end - .Lcr4_alt)
>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
>> +                             (.Lcr4_alt_end - .Lcr4_alt)
> 
> These 12's look as if they should be (.Lcr4_alt - .Lcr4_orig).

Well, the NOPs that get put there make 12 (= 3 + 6 + 3) a
pretty obvious (shorter and hence more readable) option. But
yes, if you're of the strong opinion that we should use the
longer alternative, I can switch these around.

>> +/* This mustn't modify registers other than %rax. */
>> +ENTRY(cr4_smep_smap_restore)
>> +        mov   %cr4, %rax
>> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
>> +        jnz   0f
>> +        or    cr4_smep_smap_mask(%rip), %rax
>> +        mov   %rax, %cr4
>> +        ret
>> +0:
>> +        and   cr4_smep_smap_mask(%rip), %eax
>> +        cmp   cr4_smep_smap_mask(%rip), %eax
>> +        je    1f
>> +        BUG
> 
> What is the purpose of this bugcheck? It looks like it is catching a
> mismatch of masked options, but I am not completely sure.

This aims at detecting that some of the CR4 bits which are
expected to be set really aren't (other than the case when all
of the ones of interest here are clear).

> For all other ASM level BUG's, I put a short comment on the same line,
> to aid people who hit the bug.

Will do. Question: Should this check perhaps become !NDEBUG
dependent?

>> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>>  GLOBAL(handle_exception)
>>          SAVE_ALL CLAC
>>  handle_exception_saved:
>> +        GET_CURRENT(%rbx)
>>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>>          jz    exception_with_ints_disabled
>> -        sti
>> +
>> +.Lsmep_smap_orig:
>> +        jmp   0f
>> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
>> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 
>> 0xcc
>> +        .else
>> +        // worst case: rex + opcode + modrm + 4-byte displacement
>> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
>> +        .endif
> 
> Which bug is this?  How does it manifest.  More generally, what is this
> alternative trying to achieve?

The .org gets a warning (.Lsmep_smap_orig supposedly being
undefined, and hence getting assumed to be zero) followed by
an error (attempt to move the current location backwards). The
fix https://sourceware.org/ml/binutils/2016-03/msg00030.html
is pending approval.

>> +        .pushsection .altinstr_replacement, "ax"
>> +.Lsmep_smap_alt:
>> +        mov   VCPU_domain(%rbx),%rax
>> +.Lsmep_smap_alt_end:
>> +        .section .altinstructions, "a"
>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>> +                             X86_FEATURE_SMEP, \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>> +                             X86_FEATURE_SMAP, \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>> +        .popsection
>> +
>> +        testb $3,UREGS_cs(%rsp)
>> +        jz    0f
>> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
> 
> This comparison is wrong on hardware lacking SMEP and SMAP, as the "mov
> VCPU_domain(%rbx),%rax" won't have happened.

That mov indeed won't have happened, but the original instruction
is a branch past all of this code, so the above is correct (and I did
test on older hardware).

>> +        je    0f
>> +        call  cr4_smep_smap_restore
>> +        /*
>> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
>> +         * compat_restore_all_guest and it actually returning to guest
>> +         * context, in which case the guest would run with the two features
>> +         * enabled. The only bad that can happen from this is a kernel mode
>> +         * #PF which the guest doesn't expect. Rather than trying to make 
>> the
>> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
>> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
>> +         * back to the guest (which will in turn clear the two CR4 bits) to
>> +         * re-execute the instruction. If we get back here, the CR4 bits
>> +         * should then be found clear (unless another NMI/#MC occurred at
>> +         * exactly the right time), and we'll continue processing the
>> +         * exception as normal.
>> +         */
>> +        test  %rax,%rax
>> +        jnz   0f
>> +        mov   $PFEC_page_present,%al
>> +        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
>> +        jne   0f
>> +        xor   UREGS_error_code(%rsp),%eax
>> +        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
>> +        jz    compat_test_all_events
>> +0:      sti
> 
> Its code like this which makes me even more certain that we have far too
> much code written in assembly which doesn't need to be.  Maybe not this
> specific sample, but it has taken me 15 minutes and a pad of paper to
> try and work out how this conditional works, and I am still not certain
> its correct.  In particular, PFEC_prot_key looks like it fool the test
> into believing a non-smap/smep fault was a smap/smep fault.

Not sure how you come to think of PFEC_prot_key here: That's
a bit which can be set only together with PFEC_user_mode, yet
we care about kernel mode faults only here.

> Can you at least provide some C in a comment with the intended
> conditional, to aid clarity?

Sure, if you think that helps beyond the (I think) pretty extensive
comment:

+        test  %rax,%rax
+        jnz   0f
+        /*
+         * The below effectively is
+         * if ( regs->entry_vector == TRAP_page_fault &&
+         *      (regs->error_code & PFEC_page_present) &&
+         *      !(regs->error_code & ~(PFEC_write_access|PFEC_insn_fetch)) )
+         *     goto compat_test_all_events;
+         */
+        mov   $PFEC_page_present,%al
+        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
+        jne   0f
+        xor   UREGS_error_code(%rsp),%eax
+        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
+        jz    compat_test_all_events
+0:

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.