[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
From: Nadav Amit <namit@xxxxxxxxxx>
Date: Wed, 3 Jul 2019 18:09:30 +0000
Accept-language: en-US
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=namit@xxxxxxxxxx;
Cc: Juergen Gross <jgross@xxxxxxxx>, Sasha Levin <sashal@xxxxxxxxxx>, "linux-hyperv@xxxxxxxxxxxxxxx" <linux-hyperv@xxxxxxxxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>, kvm list <kvm@xxxxxxxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>, the arch/x86 maintainers <x86@xxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx" <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Andy Lutomirski <luto@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
Delivery-date: Wed, 03 Jul 2019 18:09:36 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
Thread-index: AQHVMW7qykCnng2DBUWUciRHoRZgF6a47Y6AgAAxzICAAAuJAIAAByiA
Thread-topic: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

> On Jul 3, 2019, at 10:43 AM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> 
> On 03/07/2019 18:02, Nadav Amit wrote:
>>> On Jul 3, 2019, at 7:04 AM, Juergen Gross <jgross@xxxxxxxx> wrote:
>>> 
>>> On 03.07.19 01:51, Nadav Amit wrote:
>>>> To improve TLB shootdown performance, flush the remote and local TLBs
>>>> concurrently. Introduce flush_tlb_multi() that does so. Introduce
>>>> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
>>>> and hyper-v are only compile-tested).
>>>> While the updated smp infrastructure is capable of running a function on
>>>> a single local core, it is not optimized for this case. The multiple
>>>> function calls and the indirect branch introduce some overhead, and
>>>> might make local TLB flushes slower than they were before the recent
>>>> changes.
>>>> Before calling the SMP infrastructure, check if only a local TLB flush
>>>> is needed to restore the lost performance in this common case. This
>>>> requires to check mm_cpumask() one more time, but unless this mask is
>>>> updated very frequently, this should impact performance negatively.
>>>> Cc: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>
>>>> Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
>>>> Cc: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>
>>>> Cc: Sasha Levin <sashal@xxxxxxxxxx>
>>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>>> Cc: Borislav Petkov <bp@xxxxxxxxx>
>>>> Cc: x86@xxxxxxxxxx
>>>> Cc: Juergen Gross <jgross@xxxxxxxx>
>>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>>> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>>>> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
>>>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>>> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
>>>> Cc: linux-hyperv@xxxxxxxxxxxxxxx
>>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>>> Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
>>>> Cc: kvm@xxxxxxxxxxxxxxx
>>>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
>>>> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
>>>> ---
>>>> arch/x86/hyperv/mmu.c                 | 13 +++---
>>>> arch/x86/include/asm/paravirt.h       |  6 +--
>>>> arch/x86/include/asm/paravirt_types.h |  4 +-
>>>> arch/x86/include/asm/tlbflush.h       |  9 ++--
>>>> arch/x86/include/asm/trace/hyperv.h   |  2 +-
>>>> arch/x86/kernel/kvm.c                 | 11 +++--
>>>> arch/x86/kernel/paravirt.c            |  2 +-
>>>> arch/x86/mm/tlb.c                     | 65 ++++++++++++++++++++-------
>>>> arch/x86/xen/mmu_pv.c                 | 20 ++++++---
>>>> include/trace/events/xen.h            |  2 +-
>>>> 10 files changed, 91 insertions(+), 43 deletions(-)
>>> ...
>>> 
>>>> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
>>>> index beb44e22afdf..19e481e6e904 100644
>>>> --- a/arch/x86/xen/mmu_pv.c
>>>> +++ b/arch/x86/xen/mmu_pv.c
>>>> @@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long 
>>>> addr)
>>>>    preempt_enable();
>>>> }
>>>> -static void xen_flush_tlb_others(const struct cpumask *cpus,
>>>> -                           const struct flush_tlb_info *info)
>>>> +static void xen_flush_tlb_multi(const struct cpumask *cpus,
>>>> +                          const struct flush_tlb_info *info)
>>>> {
>>>>    struct {
>>>>            struct mmuext_op op;
>>>> @@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct 
>>>> cpumask *cpus,
>>>>    const size_t mc_entry_size = sizeof(args->op) +
>>>>            sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());
>>>> -  trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);
>>>> +  trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);
>>>>    if (cpumask_empty(cpus))
>>>>            return;         /* nothing to do */
>>>> @@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct 
>>>> cpumask *cpus,
>>>>    args = mcs.args;
>>>>    args->op.arg2.vcpumask = to_cpumask(args->mask);
>>>> -  /* Remove us, and any offline CPUS. */
>>>> +  /* Flush locally if needed and remove us */
>>>> +  if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
>>>> +          local_irq_disable();
>>>> +          flush_tlb_func_local(info);
>>> I think this isn't the correct function for PV guests.
>>> 
>>> In fact it should be much easier: just don't clear the own cpu from the
>>> mask, that's all what's needed. The hypervisor is just fine having the
>>> current cpu in the mask and it will do the right thing.
>> Thanks. I will do so in v3. I don’t think Hyper-V people would want to do
>> the same, unfortunately, since it would induce VM-exit on TLB flushes.
> 
> Why do you believe the vmexit matters?  You're talking one anyway for
> the IPI.
> 
> Intel only have virtualised self-IPI, and while AMD do have working
> non-self IPIs, you still take a vmexit anyway if any destination vcpu
> isn't currently running in non-root mode (IIRC).
> 
> At that point, you might as well have the hypervisor do all the hard
> work via a multi-cpu shootdown/flush hypercall, rather than trying to
> arrange it locally.

I forgot that xen_flush_tlb_multi() should actually only be called when
there are some remote CPUs (as I optimized the case in which there is only a
single local CPU that needs to be flushed), so you are right.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

References:
- [Xen-devel] [PATCH v2 0/9] x86: Concurrent TLB flushes
  - From: Nadav Amit
- [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
  - From: Nadav Amit
- Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
  - From: Juergen Gross
- Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
  - From: Nadav Amit
- Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
  - From: Andrew Cooper

Prev by Date: Re: [Xen-devel] [PATCH v3 05/15] x86/IRQ: consolidate use of ->arch.cpu_mask
Next by Date: Re: [Xen-devel] [PATCH v3 06/15] x86/IRQ: fix locking around vector management
Previous by thread: Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
Next by thread: [Xen-devel] [xen-4.9-testing test] 138705: regressions - trouble: blocked/broken/fail/pass
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.