|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
On Fri, Feb 28, 2020 at 05:44:57PM +0100, Jan Beulich wrote:
> On 28.02.2020 17:31, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
> >> On 19.02.2020 18:43, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/mm/hap/hap.c
> >>> +++ b/xen/arch/x86/mm/hap/hap.c
> >>> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int
> >>> do_locking, bool noflush)
> >>> hvm_update_guest_cr3(v, noflush);
> >>> }
> >>>
> >>> +/*
> >>> + * NB: doesn't actually perform any flush, used just to clear the CPU
> >>> from the
> >>> + * mask and hence signal that the guest TLB flush has been done.
> >>> + */
> >>
> >> "has been done" isn't correct, as the flush may happen only later
> >> on (upon next VM entry). I think wording here needs to be as
> >> precise as possible, however the comment may turn out unnecessary
> >> altogether:
> >
> > What about:
> >
> > /*
> > * NB: Dummy function exclusively used as a way to trigger a vmexit,
> > * and thus force an ASID/VPID update on vmentry (that will flush the
> > * cache).
> > */
>
> Once it's empty - yes, looks okay (with s/cache/TLB/).
Ack.
> >>> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt,
> >>> struct vcpu *v),
> >>> if ( !flush_vcpu(ctxt, v) )
> >>> continue;
> >>>
> >>> - paging_update_cr3(v, false);
> >>> + hvm_asid_flush_vcpu(v);
> >>>
> >>> cpu = read_atomic(&v->dirty_cpu);
> >>> - if ( is_vcpu_dirty_cpu(cpu) )
> >>> + if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
> >>> __cpumask_set_cpu(cpu, mask);
> >>> }
> >>>
> >>> - /* Flush TLBs on all CPUs with dirty vcpu state. */
> >>> - flush_tlb_mask(mask);
> >>> -
> >>> - /* Done. */
> >>> - for_each_vcpu ( d, v )
> >>> - if ( v != current && flush_vcpu(ctxt, v) )
> >>> - vcpu_unpause(v);
> >>> + /*
> >>> + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to
> >>> force an
> >>> + * ASID/VPID change and hence accomplish a guest TLB flush. Note
> >>> that vCPUs
> >>> + * not currently running will already be flushed when scheduled
> >>> because of
> >>> + * the ASID tickle done in the loop above.
> >>> + */
> >>> + on_selected_cpus(mask, handle_flush, mask, 0);
> >>> + while ( !cpumask_empty(mask) )
> >>> + cpu_relax();
> >>
> >> Why do you pass 0 as last argument? If you passed 1, you wouldn't
> >> need the while() here, handle_flush() could be empty, and you'd
> >> save a perhaps large amount of CPUs to all try to modify two
> >> cache lines (the one used by on_selected_cpus() itself and the
> >> one here) instead of just one. Yes, latency from the last CPU
> >> clearing its bit to you being able to exit from here may be a
> >> little larger this way, but overall latency may shrink with the
> >> cut by half amount of atomic ops issued to the bus.
> >
> > In fact I think passing 0 as the last argument is fine, and
> > handle_flush can be empty in that case anyway. We don't really care
> > whether on_selected_cpus returns before all CPUs have executed the
> > dummy function, as long as all of them have taken a vmexit. Using 0
> > already guarantees that AFAICT.
>
> Isn't it that the caller ants to be guaranteed that the flush
> has actually occurred (or at least that no further insns can
> be executed prior to the flush happening, i.e. at least the VM
> exit having occurred)?
Yes, but that would happen with 0 as the last argument already, see
the code in smp_call_function_interrupt:
if ( call_data.wait )
{
(*func)(info);
smp_mb();
cpumask_clear_cpu(cpu, &call_data.selected);
}
else
{
smp_mb();
cpumask_clear_cpu(cpu, &call_data.selected);
(*func)(info);
}
The only difference is whether on_selected_cpus can return before all
the functions have been executed on all CPUs, or whether all CPUs have
taken a vmexit. The later is fine for our use-case.
> >> (Forcing an empty function to be called may seem odd, but sending
> >> just some IPI [like the event check one] wouldn't be enough, as
> >> you want to be sure the other side has actually received the
> >> request.)
> >
> > Exactly, that's the reason for the empty function.
>
> But the function isn't empty.
It will be now, since on_selected_cpus already does the needed
synchronization as you pointed out.
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |