[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 6/6] xen/x86: use PCID feature for XPTI



>>> On 02.03.18 at 09:14, <jgross@xxxxxxxx> wrote:
> Avoid flushing the complete TLB when switching %cr3 for mitigation of
> Meltdown by using the PCID feature if available.
> 
> We are using 4 PCID values for a 64 bit pv domain subject to XPTI:
> 
> - hypervisor active and guest in kernel mode
> - guest active and in kernel mode
> - hypervisor active and guest in user mode
> - guest active and in user mode
> 
> The 2 hypervisor cases could possibly be merged, but for security
> reasons this is left for another patch.

I don't understand this part - one less PCID can't mean the code is
going to become more complicated, can it? And without PCID there's
no extra boundary there anyway. Furthermore with two different
PCIDs INVLPG may not do anymore what we expect it to do. This
(for the guest) similarly would affect e.g. MMUEXT_INVLPG_*.

> @@ -417,6 +418,8 @@ static bool prepare_set(void)
>       cr4 = read_cr4();
>       if (cr4 & X86_CR4_PGE)
>               write_cr4(cr4 & ~X86_CR4_PGE);
> +     else if ( cpu_has_invpcid )
> +             invpcid_flush_all();
>       else
>               asm volatile( "mov %0, %%cr3" : : "r" (read_cr3()) : "memory" );
>  
> @@ -440,6 +443,8 @@ static void post_set(bool pge)
>       /*  Reenable CR4.PGE (also flushes the TLB) */
>       if (pge)
>               write_cr4(read_cr4() | X86_CR4_PGE);
> +     else if ( cpu_has_invpcid )
> +             invpcid_flush_all();
>       else
>               asm volatile( "mov %0, %%cr3" : : "r" (read_cr3()) : "memory" );

I'm confused by the ordering of these - I take it you imply that with
PCID enabled we'll have CR4.PGE clear at all times (but see also
below on that aspect). But is it worth making the code above
depend on it? It would look more natural and less prone to become
a hidden trap if you checked cpu_has_invpcid first.

And what about the PCID-but-no-INVPCID case? A simple CR3
write doesn't flush "foreign" PCIDs aiui. Oh, I've just found further
down that you enable PCID only when both are available. The
description alone doesn't make that clear.

> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -75,39 +75,46 @@ static void post_flush(u32 t)
>  static void do_flush_tlb(unsigned long cr3)
>  {
>      unsigned long cr4;
> +    u32 t;
> +
> +    t = pre_flush();
>  
>      cr4 = read_cr4();
> -    if ( cr4 & X86_CR4_PGE )
> +
> +    if ( cpu_has_invpcid )
>      {
> -        write_cr4(cr4 & ~X86_CR4_PGE);
>          if ( cr3 )
>              asm volatile ( "mov %0, %%cr3" : : "r" (cr3) : "memory" );
> -        else
> -            barrier();
> -        write_cr4(cr4);
> +        if ( !cr3 || (cr3 & X86_CR3_NOFLUSH) || (cr4 & X86_CR4_PGE) )
> +            invpcid_flush_all();

Andrew has already explained how this may not be correct in all cases.

As to the function's name - by the time we get here I'm no longer
sure what the best name for it would be.

> @@ -128,30 +135,33 @@ unsigned int flush_area_local(const void *va, unsigned 
> int flags)
>      {
>          if ( order == 0 )
>          {
> -            /*
> -             * We don't INVLPG multi-page regions because the 2M/4M/1G
> -             * region may not have been mapped with a superpage. Also there
> -             * are various errata surrounding INVLPG usage on superpages, and
> -             * a full flush is in any case not *that* expensive.
> -             */
> -            asm volatile ( "invlpg %0"
> -                           : : "m" (*(const char *)(va)) : "memory" );
> -        }
> -        else
> -        {
> -            u32 t = pre_flush();
> +            if ( read_cr3() & X86_CR3_PCIDMASK )

Wouldn't you better look at CR4.PCIDE? And then also flush PCID 0
below, just to be on the safe side?

> +            {
> +                unsigned long addr = (unsigned long)va;
>  
> -            if ( !cpu_has_invpcid )
> -                do_flush_tlb(0);
> +                /*
> +                 * Flush the addresses for all potential address spaces.
> +                 */

Single line comment.

> +                invpcid_flush_one(PCID_PV_PRIV, addr);
> +                invpcid_flush_one(PCID_PV_USER, addr);
> +                invpcid_flush_one(PCID_PV_PRIV | PCID_PV_XEN, addr);
> +                invpcid_flush_one(PCID_PV_USER | PCID_PV_XEN, addr);

Other than INVLPG this doesn't invalidate global mappings, yet I
can't identify any code to globally disable (or never enable)
CR4.PGE.

Out of curiosity, how do 4 INVPCIDs compare to a single INVLPG
performance wise?

> +            }
>              else
> +            {
>                  /*
> -                 * Using invpcid to flush all mappings works
> -                 * regardless of whether PCID is enabled or not.
> -                 * It is faster than read-modify-write CR4.
> +                 * We don't INVLPG multi-page regions because the 2M/4M/1G
> +                 * region may not have been mapped with a superpage. Also 
> there
> +                 * are various errata surrounding INVLPG usage on superpages,
> +                 * and a full flush is in any case not *that* expensive.
>                   */
> -                invpcid_flush_all();
> -
> -            post_flush(t);
> +                asm volatile ( "invlpg %0"
> +                               : : "m" (*(const char *)(va)) : "memory" );
> +            }
> +        }
> +        else
> +        {
> +            do_flush_tlb(0);
>          }

Pointless braces.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -506,6 +506,8 @@ void free_shared_domheap_page(struct page_info *page)
>  void make_cr3(struct vcpu *v, mfn_t mfn)
>  {
>      v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
> +    if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.pcid )
> +        v->arch.cr3 |= X86_CR3_NOFLUSH | get_pv_pcid(v, 1);

Is it a good idea to suppress the flush this way for every reader
of v->arch.cr3? For example, what about the use in
dbg_pv_va2mfn()? I think it should be the consumers of the field
to decide whether to OR in that flag (which isn't part of the
register value anyway).

> @@ -514,7 +516,15 @@ void write_ptbase(struct vcpu *v)
>      {
>          get_cpu_info()->root_pgt_changed = true;
>          get_cpu_info()->pv_cr3 = __pa(this_cpu(root_pgt));
> -        asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
> +        if ( v->domain->arch.pv_domain.pcid )
> +        {
> +            get_cpu_info()->pv_cr3 |= X86_CR3_NOFLUSH | get_pv_pcid(v, 0);
> +            write_cr3(v->arch.cr3);
> +        }
> +        else
> +        {
> +            asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" 
> );
> +        }

Pointless braces again.

> --- a/xen/arch/x86/pv/domain.c
> +++ b/xen/arch/x86/pv/domain.c
> @@ -96,8 +96,12 @@ void xpti_domain_init(struct domain *d)
>      }
>  
>      if ( d->arch.pv_domain.xpti )
> +    {
> +        d->arch.pv_domain.pcid = cpu_has_pcid && cpu_has_invpcid;

Perhaps again better to key off of CR4.PCIDE? For example I imagine
we'll want to have a command line option to suppress the use of PCID
in case of problems.

> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -260,8 +260,20 @@ struct pv_domain
>  
>      /* XPTI active? */
>      bool xpti;
> +
> +    /* Use PCID for the different address spaces? */
> +    bool pcid;
>  };
>  
> +/* PCID values for the address spaces: */
> +#define PCID_PV_PRIV      0x0001
> +#define PCID_PV_USER      0x0002
> +#define PCID_PV_XEN       0x0004    /* To be ORed to above values. */
> +
> +#define get_pv_pcid(v, xen)                                              \
> +    (((xen) ? PCID_PV_XEN : 0) |                                         \
> +     (((v)->arch.flags & TF_kernel_mode) ? PCID_PV_PRIV : PCID_PV_USER))

Inline function? Plus - the "xen" parameter is really of boolean kind,
which means the callers want to pass true/false instead of 1/0.

> @@ -615,18 +627,18 @@ void vcpu_show_registers(const struct vcpu *);
>  unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long 
> guest_cr4);
>  
>  /* Convert between guest-visible and real CR4 values. */
> -#define pv_guest_cr4_to_real_cr4(v)                         \
> -    (((v)->arch.pv_vcpu.ctrlreg[4]                          \
> -      | (mmu_cr4_features                                   \
> -         & (X86_CR4_PGE | X86_CR4_PSE | X86_CR4_SMEP |      \
> -            X86_CR4_SMAP | X86_CR4_OSXSAVE |                \
> -            X86_CR4_FSGSBASE))                              \
> -      | ((v)->domain->arch.vtsc ? X86_CR4_TSD : 0))         \
> -     & ~(X86_CR4_DE |                                       \
> +#define pv_guest_cr4_to_real_cr4(v)                             \
> +    (((v)->arch.pv_vcpu.ctrlreg[4]                              \
> +      | (mmu_cr4_features                                       \
> +         & (X86_CR4_PGE | X86_CR4_PSE | X86_CR4_SMEP |          \
> +            X86_CR4_SMAP | X86_CR4_OSXSAVE |                    \
> +            X86_CR4_FSGSBASE | X86_CR4_PCIDE))                  \
> +      | ((v)->domain->arch.vtsc ? X86_CR4_TSD : 0))             \
> +     & ~(X86_CR4_DE |                                           \
>           ((v)->domain->arch.pv_domain.xpti ? X86_CR4_PGE : 0)))

I don't see why you need to reposition the backslashes here, without
which this would by quite a bit easier to review.

> --- a/xen/include/asm-x86/x86-defns.h
> +++ b/xen/include/asm-x86/x86-defns.h
> @@ -46,6 +46,7 @@
>   * Intel CPU flags in CR3
>   */
>  #define X86_CR3_NOFLUSH (_AC(1, ULL) << 63)
> +#define X86_CR3_PCIDMASK _AC(0x0000000000000fff, ULL) /* Mask for PCID */

I'm not intending to count whether the number of zeros here is "right";
any number of them is okay actually. Why don't you omit them?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.