[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry()



On 18/11/2021 09:07, Jan Beulich wrote:
> On 18.11.2021 10:06, Jan Beulich wrote:
>> On 18.11.2021 01:32, Andrew Cooper wrote:
>>> On 12/11/2021 10:43, Jan Beulich wrote:
>>>> On 11.11.2021 18:57, Andrew Cooper wrote:
>>>>> Function pointers are expensive, and the raw parameter is a constant from 
>>>>> all
>>>>> callers, meaning that it predicts very well with local branch history.
>>>> The code change is fine, but I'm having trouble with "all" here: Both
>>>> functions aren't even static, so while callers in io_apic.c may
>>>> benefit (perhaps with the exception of ioapic_{read,write}_entry(),
>>>> depending on whether the compiler views inlining them as warranted),
>>>> I'm in no way convinced this extends to the callers in VT-d code.
>>>>
>>>> Further ISTR clang being quite a bit less aggressive about inlining,
>>>> so the effects might not be quite as good there even for the call
>>>> sites in io_apic.c.
>>>>
>>>> Can you clarify this for me please?
>>> The way the compiler lays out the code is unrelated to why this form is 
>>> an improvement.
>>>
>>> Branch history is a function of "the $N most recently taken branches".  
>>> This is because "how you got here" is typically relevant to "where you 
>>> should go next".
>>>
>>> Trivial schemes maintain a shift register of taken / not-taken results.  
>>> Less trivial schemes maintain a rolling hash of (src addr, dst addr) 
>>> tuples of all taken branches (direct and indirect).  In both cases, the 
>>> instantaneous branch history is an input into the final prediction, and 
>>> is commonly used to select which saturating counter (or bank of 
>>> counters) is used.
>>>
>>> Consider something like
>>>
>>> while ( cond )
>>> {
>>>      memcpy(dst1, src1, 64);
>>>      memcpy(dst2, src2, 7);
>>> }
>>>
>>> Here, the conditional jump inside memcpy() coping with the tail of the 
>>> copy flips result 50% of the time, which is fiendish to predict for.
>>>
>>> However, because the branch history differs (by memcpy()'s return 
>>> address which was accumulated by the call instruction), the predictor 
>>> can actually use two different taken/not-taken counters for the two 
>>> different "instances" if the tail jump.  After a few iterations to warm 
>>> up, the predictor will get every jump perfect despite the fact that 
>>> memcpy() is a library call and the branches would otherwise alias.
>>>
>>>
>>> Bringing it back to the code in question.  The "raw" parameter is an 
>>> explicit true or false at the top of all call paths leading into these 
>>> functions.  Therefore, an individual branch history has a high 
>>> correlation with said true or false, irrespective of the absolute code 
>>> layout.  As a consequence, the correct result of the prediction is 
>>> highly correlated with the branch history, and it will predict 
>>> perfectly[1] after a few times the path has been used.
>> Thanks a lot for the explanation. May I suggest to make this less
>> ambiguous in the description, e.g. by saying "the raw parameter is a
>> constant at the root of all call trees"?

Done.

> Oh, forgot to say that then:
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

Thanks.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.