Xen project Mailing List

Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry()

To: Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Andrew Cooper <amc96@xxxxxxxx>

Date: Thu, 18 Nov 2021 17:33:54 +0000

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 18 Nov 2021 17:34:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18/11/2021 09:07, Jan Beulich wrote: > On 18.11.2021 10:06, Jan Beulich wrote: >> On 18.11.2021 01:32, Andrew Cooper wrote: >>> On 12/11/2021 10:43, Jan Beulich wrote: >>>> On 11.11.2021 18:57, Andrew Cooper wrote: >>>>> Function pointers are expensive, and the raw parameter is a constant from >>>>> all >>>>> callers, meaning that it predicts very well with local branch history. >>>> The code change is fine, but I'm having trouble with "all" here: Both >>>> functions aren't even static, so while callers in io_apic.c may >>>> benefit (perhaps with the exception of ioapic_{read,write}_entry(), >>>> depending on whether the compiler views inlining them as warranted), >>>> I'm in no way convinced this extends to the callers in VT-d code. >>>> >>>> Further ISTR clang being quite a bit less aggressive about inlining, >>>> so the effects might not be quite as good there even for the call >>>> sites in io_apic.c. >>>> >>>> Can you clarify this for me please? >>> The way the compiler lays out the code is unrelated to why this form is >>> an improvement. >>> >>> Branch history is a function of "the $N most recently taken branches". >>> This is because "how you got here" is typically relevant to "where you >>> should go next". >>> >>> Trivial schemes maintain a shift register of taken / not-taken results. >>> Less trivial schemes maintain a rolling hash of (src addr, dst addr) >>> tuples of all taken branches (direct and indirect). In both cases, the >>> instantaneous branch history is an input into the final prediction, and >>> is commonly used to select which saturating counter (or bank of >>> counters) is used. >>> >>> Consider something like >>> >>> while ( cond ) >>> { >>> memcpy(dst1, src1, 64); >>> memcpy(dst2, src2, 7); >>> } >>> >>> Here, the conditional jump inside memcpy() coping with the tail of the >>> copy flips result 50% of the time, which is fiendish to predict for. >>> >>> However, because the branch history differs (by memcpy()'s return >>> address which was accumulated by the call instruction), the predictor >>> can actually use two different taken/not-taken counters for the two >>> different "instances" if the tail jump. After a few iterations to warm >>> up, the predictor will get every jump perfect despite the fact that >>> memcpy() is a library call and the branches would otherwise alias. >>> >>> >>> Bringing it back to the code in question. The "raw" parameter is an >>> explicit true or false at the top of all call paths leading into these >>> functions. Therefore, an individual branch history has a high >>> correlation with said true or false, irrespective of the absolute code >>> layout. As a consequence, the correct result of the prediction is >>> highly correlated with the branch history, and it will predict >>> perfectly[1] after a few times the path has been used. >> Thanks a lot for the explanation. May I suggest to make this less >> ambiguous in the description, e.g. by saying "the raw parameter is a >> constant at the root of all call trees"? Done. > Oh, forgot to say that then: > Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> Thanks. ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.