[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] x86: Meltdown band-aid against malicious 64-bit PV guests



>>> On 15.01.18 at 19:23, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 15/01/18 11:06, Jan Beulich wrote:
>> --- a/xen/arch/x86/x86_64/compat/entry.S
>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>> @@ -199,6 +199,17 @@ ENTRY(cstar_enter)
>>          pushq $0
>>          movl  $TRAP_syscall, 4(%rsp)
>>          SAVE_ALL
>> +
>> +        GET_STACK_END(bx)
>> +        mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
>> +        neg   %rcx
>> +UNLIKELY_START(nz, cstar_cr3)
>> +        mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
>> +        neg   %rcx
>> +        write_cr3 rcx, rdi, rsi
>> +        movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
>> +UNLIKELY_END(cstar_cr3)
> 
> These UNLIKELY()s aren't correct.  It will depend on hardware and
> command line setting as to whether we expect to update cr3.

Why are they not correct? What 64-bit kernels do you know that
use the CSTAR entry method in PV mode? Afaik this was in
existence for a range of kernel versions only in our forward port.
The INT80 path is perhaps indeed more questionable in this regard.

> Furthermore, they will complicate splitting the early entry code away
> from the main .text section for a full isolation implementation.
> 
> For now, I'd drop them and have a simple jz .Lskip.

I will replace them (on the entry paths; I think the one on the exit-
to-Xen path is valid) to please you and in the interest of forward
progress; maybe this will even slightly help backporting to really old
Xen versions, where the UNLIKELY* constructs don't exist yet.

> Also, can we collect these together into macros, rather than
> opencoding?  We seem to have 3 distinct variations.

I had considered that (following the model you use in the SP2
series), but decided against it not the least because of the
dependent but placement-wise separated code additions to
restore original values. Plus again this might be a hindrance of
backporting to really old Xen (which then typically will also be
built on really old tool chains) - as you certainly recall old gas
had quite a few issues with macro handling.

>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -37,6 +37,32 @@ ENTRY(switch_to_kernel)
>>  /* %rbx: struct vcpu, interrupts disabled */
>>  restore_all_guest:
>>          ASSERT_INTERRUPTS_DISABLED
>> +
>> +        /* Copy guest mappings and switch to per-CPU root page table. */
>> +        mov   %cr3, %r9
>> +        GET_STACK_END(dx)
>> +        mov   STACK_CPUINFO_FIELD(pv_cr3)(%rdx), %rdi
>> +        movabs $PADDR_MASK & PAGE_MASK, %rsi
>> +        movabs $DIRECTMAP_VIRT_START, %rcx
>> +        mov   %rdi, %rax
>> +        and   %rsi, %rdi
>> +        and   %r9, %rsi
>> +        add   %rcx, %rdi
>> +        add   %rcx, %rsi
>> +        mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>> +        mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>> +        mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
>> +        rep movsq
>> +        mov   $ROOT_PAGETABLE_ENTRIES - \
>> +               ROOT_PAGETABLE_LAST_XEN_SLOT - 1, %ecx
>> +        sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>> +                ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rsi
>> +        sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>> +                ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>> +        rep movsq
>> +        mov   %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
>> +        write_cr3 rax, rdi, rsi
>> +
> 
> Can we possibly move this up into C?  For this simplistic algorithm it
> is ok in ASM, but if we want to do any optimisations to avoid the 4k
> memcpy (generation count hidden somewhere in page_info?), ASM is going
> quickly become unwieldy.

I'd prefer to move it into C when it really becomes necessary. Also
you don't properly qualify "this" - for example, I'd rather not move
the write_cr3 invocation into C, yet the placement of your comment
suggests to do so.

> Another optimisation I found made a massive difference for the KAISER
> series was to have an MRU cache of 4 pagetables, so in-guest syscalls
> don't result in any copying as we pass in and out of Xen.

As said elsewhere - optimization can come later. Plus - is avoiding the
copying at _any_ time actually correct, considering possible racing
L4 entry updates on another vCPU?

>> @@ -71,6 +97,18 @@ iret_exit_to_guest:
>>          ALIGN
>>  /* No special register assumptions. */
>>  restore_all_xen:
>> +        /*
>> +         * Check whether we need to switch to the per-CPU page tables, in
>> +         * case we return to late PV exit code (from an NMI or #MC).
>> +         */
>> +        GET_STACK_END(ax)
>> +        mov   STACK_CPUINFO_FIELD(xen_cr3)(%rax), %rdx
>> +        mov   STACK_CPUINFO_FIELD(pv_cr3)(%rax), %rax
>> +        test  %rdx, %rdx
>> +UNLIKELY_START(g, exit_cr3)
> 
> cmp or ne ?

"ne" (or really "nz" when used with "test") is outright wrong - we
want to skip the restore when the value is zero _or_ negative.
What's wrong with "jg" and "test" in combination? There simply is
no "jnsz" (other than e.g. "jnbe"). "cmp" against zero could be
used here, but why would I use the larger instruction when "test"
does?

>> @@ -585,6 +692,17 @@ ENTRY(double_fault)
>>          movl  $TRAP_double_fault,4(%rsp)
>>          /* Set AC to reduce chance of further SMAP faults */
>>          SAVE_ALL STAC
>> +
>> +        GET_STACK_END(bx)
>> +        mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rbx
>> +        test  %rbx, %rbx
>> +        jz    .Ldblf_cr3_okay
>> +        jns   .Ldblf_cr3_load
>> +        neg   %rbx
>> +.Ldblf_cr3_load:
>> +        write_cr3 rbx, rdi, rsi
>> +.Ldblf_cr3_okay:
>> +
> 
> It is moderately common for there to be cascade faults in #DF.  This
> would be better if it were the general IST switch.

Based on the issues I had with #DF occurring while debugging this,
I've decided to keep the code here as simple as possible without
being incorrect: There's no point looking at the incoming CR3.
There's no point in trying to avoid nested faults (including
subsequent #DF) restoring CR3. There's also no point in retaining
the value for later restoring here, as we never return. In fact, as
mentioned elsewhere, we should imo indeed consider unilaterally
switching to idle_pg_table[] here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.