[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 5/7] xen/x86: disable global pages for domains with XPTI active



On 06/04/18 08:52, Juergen Gross wrote:
> Instead of flushing the TLB from global pages when switching address
> spaces with XPTI being active just disable global pages via %cr4
> completely when a domain subject to XPTI is active. This avoids the
> need for extra TLB flushes as loading %cr3 will remove all TLB
> entries.
>
> In order to avoid states with cr3/cr4 having inconsistent values
> (e.g. global pages being activated while cr3 already specifies a XPTI
> address space) move loading of the new cr4 value to write_ptbase()
> (actually to write_cr3_cr4() called by write_ptbase()).
>
> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

This is contrary to the performance optimisations taken by Linux,
Windows and Apple, which borrowing Xen's 64bit PV optimisation of having
global user pages, because it really is a (small) performance improvement.

Are there performance numbers for this change alone, and/or are later
changes strictly dependent on this functionality?

On the Xen side of things, an argument could probably be made that the
extra cr4 rewrites due to the L4 shadowing might eat away the
performance we would otherwise gain, but I'd be hesitant to blindly
assume that this is the case.

A complicating factor is that Intel have said that the performance gains
from user global pages would be more noticeable on older hardware, due
to differences in the TLB architecture.

> ---
> V4:
> - don't use mmu_cr4_features for setting new cr4 value (Jan Beulich)
> - use simpler scheme for setting X86_CR4_PGE in
>   pv_guest_cr4_to_real_cr4() (Jan Beulich)
>
> V3:
> - move cr4 loading for all domains from *_ctxt_switch_to() to
>   write_cr3_cr4() called by write_ptbase() (Jan Beulich)
> - rebase
> ---
>  xen/arch/x86/domain.c          |  5 -----
>  xen/arch/x86/flushtlb.c        | 13 ++++++++-----
>  xen/arch/x86/mm.c              | 14 +++++++++++---
>  xen/arch/x86/x86_64/entry.S    | 10 ----------
>  xen/common/efi/runtime.c       |  4 ++--
>  xen/include/asm-x86/domain.h   |  3 ++-
>  xen/include/asm-x86/flushtlb.h |  2 +-
>  7 files changed, 24 insertions(+), 27 deletions(-)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 9c229594f4..c2bb70c483 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1522,17 +1522,12 @@ void paravirt_ctxt_switch_from(struct vcpu *v)
>  void paravirt_ctxt_switch_to(struct vcpu *v)
>  {
>      root_pgentry_t *root_pgt = this_cpu(root_pgt);
> -    unsigned long cr4;
>  
>      if ( root_pgt )
>          root_pgt[root_table_offset(PERDOMAIN_VIRT_START)] =
>              l4e_from_page(v->domain->arch.perdomain_l3_pg,
>                            __PAGE_HYPERVISOR_RW);
>  
> -    cr4 = pv_guest_cr4_to_real_cr4(v);
> -    if ( unlikely(cr4 != read_cr4()) )
> -        write_cr4(cr4);
> -
>      if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
>          activate_debugregs(v);
>  
> diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
> index f793b70696..5dcd9a2bf6 100644
> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -89,20 +89,23 @@ static void do_tlb_flush(void)
>      post_flush(t);
>  }
>  
> -void write_cr3(unsigned long cr3)
> +void write_cr3_cr4(unsigned long cr3, unsigned long cr4)
>  {
> -    unsigned long flags, cr4;
> +    unsigned long flags;
>      u32 t;
>  
>      /* This non-reentrant function is sometimes called in interrupt context. 
> */
>      local_irq_save(flags);
>  
>      t = pre_flush();
> -    cr4 = read_cr4();
>  
> -    write_cr4(cr4 & ~X86_CR4_PGE);
> +    if ( read_cr4() & X86_CR4_PGE )
> +        write_cr4(cr4 & ~X86_CR4_PGE);
> +
>      asm volatile ( "mov %0, %%cr3" : : "r" (cr3) : "memory" );
> -    write_cr4(cr4);
> +
> +    if ( read_cr4() != cr4 )

read_cr4(), despite being a cached read, isn't free because of the %rsp
manipulation required to access the variable.  I'd keep the locally
cached cr4, and use "if ( cr4 & X86_CR4_PGE )" here.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.