[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 3/5] x86/pv: Optimise prefetching in svm_load_segs()
On 10/09/2020 15:57, Jan Beulich wrote: > On 09.09.2020 11:59, Andrew Cooper wrote: >> Split into two functions. Passing a load of zeros in results in somewhat >> poor >> register scheduling in __context_switch(). > I'm afraid I don't understand why this would be, no matter that > I trust you having observed this being the case: The registers > used for passing parameters are all call-clobbered anyway, so > the compiler can't use them for anything across the call. And > it would look pretty poor code generation wise if the XORs to > clear them (which effectively have no latency at all) would be > scheduled far ahead of the call, especially when there's better > use for the registers. The observation wasn't possibly from > before your recent dropping of two of the parameters, when they > couldn't all be passed in registers (albeit even then it would > be odd, as the change then should merely have lead to a slightly > smaller stack frame of the function)? Hmm yes. I wrote this patch before I did the assertion fix, and it the comment didn't rebase very well. Back then, one of the zeros was on the stack, which was definitely an unwanted property. Even though the XORs are mostly free, they're not totally free, as they cost decode bandwidth and instruction cache space (Trivial amounts, but still...). In general, LTO's inter-procedural-analysis can figure out that svm_load_segs_prefetch() doesn't use many registers, and the caller can be optimised based on the fact that some registers aren't actually clobbered. (Then again, in this case with a sole caller, LTO really ought to be able to inline and delete the function.) How about "results in unnecessary caller setup code" ? ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |