[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
On Fri, Feb 28, 2020 at 06:00:44PM +0100, Jan Beulich wrote: > On 19.02.2020 18:43, Roger Pau Monne wrote: > > Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes. > > This greatly increases the performance of TLB flushes when running > > with a high amount of vCPUs as a Xen guest, and is specially important > > when running in shim mode. > > > > The following figures are from a PV guest running `make -j32 xen` in > > shim mode with 32 vCPUs and HAP. > > > > Using x2APIC and ALLBUT shorthand: > > real 4m35.973s > > user 4m35.110s > > sys 36m24.117s > > > > Using L0 assisted flush: > > real 1m2.596s > > user 4m34.818s > > sys 5m16.374s > > > > The implementation adds a new hook to hypervisor_ops so other > > enlightenments can also implement such assisted flush just by filling > > the hook. Note that the Xen implementation completely ignores the > > dirty CPU mask and the linear address passed in, and always performs a > > global TLB flush on all vCPUs. > > This isn't because of an implementation choice of yours, but because > of how HVMOP_flush_tlbs works. I think the statement should somehow > express this. I also think it wants clarifying that using the > hypercall is indeed faster even in the case of single-page, single- > CPU flush Are you sure about this? I think taking a vmexit is going to be more costly than executing a local invlpg? > (which I suspect may not be the case especially as vCPU > count grows). The stats above prove a positive overall effect, but > they don't say whether the effect could be even bigger by being at > least a little selective. I assume that being able to provide a bitmap with the vCPUs on whether the TLB flush should be performed would give us some more performance, but I haven't looked into it yet. > > @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820) > > ops.e820_fixup(e820); > > } > > > > +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va, > > + unsigned int order) > > +{ > > + if ( ops.flush_tlb ) > > + return alternative_call(ops.flush_tlb, mask, va, order); > > + > > + return -ENOSYS; > > +} > > Please no new -ENOSYS anywhere (except in new ports' top level > hypercall handlers). Ack. Is EOPNOTSUPP OK? > > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void > > *va, unsigned int flags) > > if ( (flags & ~FLUSH_ORDER_MASK) && > > !cpumask_subset(mask, cpumask_of(cpu)) ) > > { > > + if ( cpu_has_hypervisor && > > + !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID | > > + FLUSH_ORDER_MASK)) && > > + !hypervisor_flush_tlb(mask, va, (flags - 1) & > > FLUSH_ORDER_MASK) ) > > + { > > + if ( tlb_clk_enabled ) > > + tlb_clk_enabled = false; > > Why does this need doing here? Couldn't Xen guest setup code > clear the flag? Because it's possible that the hypercalls fails, and hence the tlb clock should be kept enabled. There's no reason to disable it until Xen knows the assisted flush is indeed available. I don't mind moving it to Xen guest setup code, but I'm not sure I see why it would be any better than doing it here. The only reason I guess is to avoid checking tlb_clk_enabled on every successful assisted flush? Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |