[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry()



On 12/11/2021 10:43, Jan Beulich wrote:
On 11.11.2021 18:57, Andrew Cooper wrote:
Function pointers are expensive, and the raw parameter is a constant from all
callers, meaning that it predicts very well with local branch history.
The code change is fine, but I'm having trouble with "all" here: Both
functions aren't even static, so while callers in io_apic.c may
benefit (perhaps with the exception of ioapic_{read,write}_entry(),
depending on whether the compiler views inlining them as warranted),
I'm in no way convinced this extends to the callers in VT-d code.

Further ISTR clang being quite a bit less aggressive about inlining,
so the effects might not be quite as good there even for the call
sites in io_apic.c.

Can you clarify this for me please?

The way the compiler lays out the code is unrelated to why this form is an improvement.

Branch history is a function of "the $N most recently taken branches".  This is because "how you got here" is typically relevant to "where you should go next".

Trivial schemes maintain a shift register of taken / not-taken results.  Less trivial schemes maintain a rolling hash of (src addr, dst addr) tuples of all taken branches (direct and indirect).  In both cases, the instantaneous branch history is an input into the final prediction, and is commonly used to select which saturating counter (or bank of counters) is used.

Consider something like

while ( cond )
{
    memcpy(dst1, src1, 64);
    memcpy(dst2, src2, 7);
}

Here, the conditional jump inside memcpy() coping with the tail of the copy flips result 50% of the time, which is fiendish to predict for.

However, because the branch history differs (by memcpy()'s return address which was accumulated by the call instruction), the predictor can actually use two different taken/not-taken counters for the two different "instances" if the tail jump.  After a few iterations to warm up, the predictor will get every jump perfect despite the fact that memcpy() is a library call and the branches would otherwise alias.


Bringing it back to the code in question.  The "raw" parameter is an explicit true or false at the top of all call paths leading into these functions.  Therefore, an individual branch history has a high correlation with said true or false, irrespective of the absolute code layout.  As a consequence, the correct result of the prediction is highly correlated with the branch history, and it will predict perfectly[1] after a few times the path has been used.

~Andrew

[1] Obviously, it's not actually perfect outside of a synthetic example.  Aliasing in the predictor is a necessary property of keeping the logic small enough to provide an answer fast, but the less accidental aliasing there is, the faster the CPU performance in benchmarks, so incentives are in our favour here.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.