|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry()
On 18.11.2021 01:32, Andrew Cooper wrote:
> On 12/11/2021 10:43, Jan Beulich wrote:
>> On 11.11.2021 18:57, Andrew Cooper wrote:
>>> Function pointers are expensive, and the raw parameter is a constant from
>>> all
>>> callers, meaning that it predicts very well with local branch history.
>> The code change is fine, but I'm having trouble with "all" here: Both
>> functions aren't even static, so while callers in io_apic.c may
>> benefit (perhaps with the exception of ioapic_{read,write}_entry(),
>> depending on whether the compiler views inlining them as warranted),
>> I'm in no way convinced this extends to the callers in VT-d code.
>>
>> Further ISTR clang being quite a bit less aggressive about inlining,
>> so the effects might not be quite as good there even for the call
>> sites in io_apic.c.
>>
>> Can you clarify this for me please?
>
> The way the compiler lays out the code is unrelated to why this form is
> an improvement.
>
> Branch history is a function of "the $N most recently taken branches".
> This is because "how you got here" is typically relevant to "where you
> should go next".
>
> Trivial schemes maintain a shift register of taken / not-taken results.
> Less trivial schemes maintain a rolling hash of (src addr, dst addr)
> tuples of all taken branches (direct and indirect). In both cases, the
> instantaneous branch history is an input into the final prediction, and
> is commonly used to select which saturating counter (or bank of
> counters) is used.
>
> Consider something like
>
> while ( cond )
> {
> memcpy(dst1, src1, 64);
> memcpy(dst2, src2, 7);
> }
>
> Here, the conditional jump inside memcpy() coping with the tail of the
> copy flips result 50% of the time, which is fiendish to predict for.
>
> However, because the branch history differs (by memcpy()'s return
> address which was accumulated by the call instruction), the predictor
> can actually use two different taken/not-taken counters for the two
> different "instances" if the tail jump. After a few iterations to warm
> up, the predictor will get every jump perfect despite the fact that
> memcpy() is a library call and the branches would otherwise alias.
>
>
> Bringing it back to the code in question. The "raw" parameter is an
> explicit true or false at the top of all call paths leading into these
> functions. Therefore, an individual branch history has a high
> correlation with said true or false, irrespective of the absolute code
> layout. As a consequence, the correct result of the prediction is
> highly correlated with the branch history, and it will predict
> perfectly[1] after a few times the path has been used.
Thanks a lot for the explanation. May I suggest to make this less
ambiguous in the description, e.g. by saying "the raw parameter is a
constant at the root of all call trees"?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |