[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

To: Nadav Amit <namit@xxxxxxxxxx>, Andy Lutomirski <luto@xxxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
From: Juergen Gross <jgross@xxxxxxxx>
Date: Wed, 3 Jul 2019 16:04:21 +0200
Cc: Sasha Levin <sashal@xxxxxxxxxx>, linux-hyperv@xxxxxxxxxxxxxxx, Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>, kvm@xxxxxxxxxxxxxxx, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>, x86@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
Delivery-date: Wed, 03 Jul 2019 14:04:28 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03.07.19 01:51, Nadav Amit wrote:

To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).

While the updated smp infrastructure is capable of running a function on
a single local core, it is not optimized for this case. The multiple
function calls and the indirect branch introduce some overhead, and
might make local TLB flushes slower than they were before the recent
changes.

Before calling the SMP infrastructure, check if only a local TLB flush
is needed to restore the lost performance in this common case. This
requires to check mm_cpumask() one more time, but unless this mask is
updated very frequently, this should impact performance negatively.

Cc: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>
Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
Cc: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>
Cc: Sasha Levin <sashal@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: x86@xxxxxxxxxx
Cc: Juergen Gross <jgross@xxxxxxxx>
Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
Cc: linux-hyperv@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
Cc: kvm@xxxxxxxxxxxxxxx
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
---
  arch/x86/hyperv/mmu.c                 | 13 +++---
  arch/x86/include/asm/paravirt.h       |  6 +--
  arch/x86/include/asm/paravirt_types.h |  4 +-
  arch/x86/include/asm/tlbflush.h       |  9 ++--
  arch/x86/include/asm/trace/hyperv.h   |  2 +-
  arch/x86/kernel/kvm.c                 | 11 +++--
  arch/x86/kernel/paravirt.c            |  2 +-
  arch/x86/mm/tlb.c                     | 65 ++++++++++++++++++++-------
  arch/x86/xen/mmu_pv.c                 | 20 ++++++---
  include/trace/events/xen.h            |  2 +-
  10 files changed, 91 insertions(+), 43 deletions(-)

...

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index beb44e22afdf..19e481e6e904 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long addr)
        preempt_enable();
  }

-static void xen_flush_tlb_others(const struct cpumask *cpus,

-                                const struct flush_tlb_info *info)
+static void xen_flush_tlb_multi(const struct cpumask *cpus,
+                               const struct flush_tlb_info *info)
  {
        struct {
                struct mmuext_op op;
@@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct cpumask 
*cpus,
        const size_t mc_entry_size = sizeof(args->op) +
                sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());

- trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);

+       trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);

if (cpumask_empty(cpus))

                return;         /* nothing to do */
@@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct cpumask 
*cpus,
        args = mcs.args;
        args->op.arg2.vcpumask = to_cpumask(args->mask);

- /* Remove us, and any offline CPUS. */

+       /* Flush locally if needed and remove us */
+       if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
+               local_irq_disable();
+               flush_tlb_func_local(info);


I think this isn't the correct function for PV guests.

In fact it should be much easier: just don't clear the own cpu from the
mask, that's all what's needed. The hypervisor is just fine having the
current cpu in the mask and it will do the right thing.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
  - From: Nadav Amit

References:
- [Xen-devel] [PATCH v2 0/9] x86: Concurrent TLB flushes
  - From: Nadav Amit
- [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
  - From: Nadav Amit

Prev by Date: Re: [Xen-devel] [PATCH] passthrough/pci: properly qualify the mem_sharing_enabled check...
Next by Date: [Xen-devel] [linux-linus test] 138710: regressions - FAIL
Previous by thread: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
Next by thread: Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.