[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/4] x86: use POPCNT for hweight<N>() when available



>>> On 03.06.19 at 10:13, <JBeulich@xxxxxxxx> wrote:
>>>> On 31.05.19 at 22:43, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 31/05/2019 02:54, Jan Beulich wrote:
>>> This is faster than using the software implementation, and the insn is
>>> available on all half-way recent hardware. Therefore convert
>>> generic_hweight<N>() to out-of-line functions (without affecting Arm)
>>> and use alternatives patching to replace the function calls.
>>>
>>> Suggested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> 
>> So, I trust you weren't expecting to just ack this and let it go in?
>> 
>> The principle of the patch (use popcnt when available) is an improvement
>> which I'm entirely in agreement with, but everything else is a problem.
>> 
>> The long and the short of it is that I'm not going to accept any version
>> of this which isn't the Linux version.
> 
> You're kidding. We want to move away from assembly wherever we
> can, and you demand new assembly code?
> 
>>>From a microarchitectural standpoint, the tradeoff between fractional
>> register scheduling flexibility (which in practice is largely bound
>> anyway by real function calls in surrounding code) and increased icache
>> pressure/coldness (from the redundant function copies) falls largely in
>> favour of the Linux way of doing it, a cold icache line is
>> disproportionally more expensive than requiring the compiler to order
>> its registers differently (especially as all non-obsolete processors
>> these days have zero-cost register renaming internally, for the purpose
>> of superscalar execution).
> 
> I'm afraid I'm struggling heavily as to what you're wanting to tell
> me here: Where's the difference (in this regard) between the
> change here and the way how Linux does it? Both emit a CALL
> insn with registers set up suitably for it, and both patch it with a
> POPCNT insn using the registers as demanded by the CALL.

Having thought about this some more, in an attempt to try to
understand (a) what you mean and (b) how you want things
to be done "your way", I'm afraid I've got more confused: Your
reply reminds me heavily of the discussion we had on the BMI2
patching series I had done (and now dropped): There you
complained about me _not_ using fixed registers and hence
potentially calling frequent i-cache-cold lines to be accessed.
While my original plan was to use a similar approach here, I
specifically went the opposite way to avoid similar complaints
of yours. Just to find that you use the (apparently) same
argument again. As a result I can only conclude that I'm now
pretty unclear on what model you would actually approve of.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.