[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 3/5] x86/pv: Optimise prefetching in svm_load_segs()
On 09.09.2020 11:59, Andrew Cooper wrote: > Split into two functions. Passing a load of zeros in results in somewhat poor > register scheduling in __context_switch(). I'm afraid I don't understand why this would be, no matter that I trust you having observed this being the case: The registers used for passing parameters are all call-clobbered anyway, so the compiler can't use them for anything across the call. And it would look pretty poor code generation wise if the XORs to clear them (which effectively have no latency at all) would be scheduled far ahead of the call, especially when there's better use for the registers. The observation wasn't possibly from before your recent dropping of two of the parameters, when they couldn't all be passed in registers (albeit even then it would be odd, as the change then should merely have lead to a slightly smaller stack frame of the function)? > Update the prefetching comment to note that the main point is the TLB fill. > > Reorder the writes in svm_load_segs() to access the VMCB fields in ascending > order, which gets better next-line prefetch behaviour out of hardware. Update > the prefetch instruction to match. > > The net delta is: > > add/remove: 1/0 grow/shrink: 0/2 up/down: 38/-39 (-1) > Function old new delta > svm_load_segs_prefetch - 38 +38 > __context_switch 967 951 -16 > svm_load_segs 291 268 -23 A net win of 1 byte ;-) > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -1520,6 +1520,19 @@ static void svm_init_erratum_383(const struct > cpuinfo_x86 *c) > } > > #ifdef CONFIG_PV > +void svm_load_segs_prefetch(void) > +{ > + struct vmcb_struct *vmcb = this_cpu(host_vmcb_va); const? > + if ( vmcb ) > + /* > + * The main reason for this prefetch is for the TLB fill. Use the > + * opporunity to fetch the lowest address used, to get the best > + * behaviour out of hardwares next-line prefetcher. Nit: "opportunity" and "hardware's" ? I'm not opposed to the change, but before giving my R-b I'd like to understand the register scheduling background a little better. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |