[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry()


  • To: Andrew Cooper <amc96@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 18 Nov 2021 10:06:22 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Y9Kh+thLpiBTnf/Qnbgg3blH8jli1hgBYOr7EBkBOLg=; b=BohjzPvrCzbZ3pi/QEY+qTJzFXs/EwmxIOdUIDoOpBL1uHIDV6i2ldA8dE7ZG18v2YlRzVlNea4HIs4p7HRieAq4dE/8KlRTlXEabYNyTs+P60odvo7ToO1axip4n3qtLWVcATf0N+Y/hEXx4N+RIMD621hKsuQ75toUXYEwA41Fp9N8jmg5UJAmAkav3a7rq9UDX8kfrvXAvdmFC07sLdbNwSALKvxKywI9+euRZ0Le4B4+YByd0tqPsI4XYvifGsnq//lypjKxMzcjBgT9tDXowsTfOaPTkBtg3TwO70P4U1R41lqWPb8PCyDlSlxBf4FoGbcG0IMKnvyGS6xOGA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kG5gvOFpZMjlxo4x6POQe9C/Frz0vn+RcGL6tVWkJQqmQYiwXeKydb5tF1aVO5XL9Duj/9nYNvojz1rYEu4ym4AH6ORw+MbeyfF94DkkTRb56fTG0gPw6W+SFpq18spA/O6Ci+LHtJwDBpSq8NRh2kqrydef/KWAyprS09OoFbupiG8slW9l1QYAHTGTglClG15YzjQTSASkLKAP1NJa5LWILAy+7gSSxkIzwyeO9tVB9OPTFlwG/RIfygmztwZ1DyrA5+Iq1S9PpTbHlJKHt1W+NmRVg5e/2H7474IbQEAaQOWs+UIHKNy2eQP3xfabov/TlGc6eIwt87bm4Iacvw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 18 Nov 2021 09:06:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18.11.2021 01:32, Andrew Cooper wrote:
> On 12/11/2021 10:43, Jan Beulich wrote:
>> On 11.11.2021 18:57, Andrew Cooper wrote:
>>> Function pointers are expensive, and the raw parameter is a constant from 
>>> all
>>> callers, meaning that it predicts very well with local branch history.
>> The code change is fine, but I'm having trouble with "all" here: Both
>> functions aren't even static, so while callers in io_apic.c may
>> benefit (perhaps with the exception of ioapic_{read,write}_entry(),
>> depending on whether the compiler views inlining them as warranted),
>> I'm in no way convinced this extends to the callers in VT-d code.
>>
>> Further ISTR clang being quite a bit less aggressive about inlining,
>> so the effects might not be quite as good there even for the call
>> sites in io_apic.c.
>>
>> Can you clarify this for me please?
> 
> The way the compiler lays out the code is unrelated to why this form is 
> an improvement.
> 
> Branch history is a function of "the $N most recently taken branches".  
> This is because "how you got here" is typically relevant to "where you 
> should go next".
> 
> Trivial schemes maintain a shift register of taken / not-taken results.  
> Less trivial schemes maintain a rolling hash of (src addr, dst addr) 
> tuples of all taken branches (direct and indirect).  In both cases, the 
> instantaneous branch history is an input into the final prediction, and 
> is commonly used to select which saturating counter (or bank of 
> counters) is used.
> 
> Consider something like
> 
> while ( cond )
> {
>      memcpy(dst1, src1, 64);
>      memcpy(dst2, src2, 7);
> }
> 
> Here, the conditional jump inside memcpy() coping with the tail of the 
> copy flips result 50% of the time, which is fiendish to predict for.
> 
> However, because the branch history differs (by memcpy()'s return 
> address which was accumulated by the call instruction), the predictor 
> can actually use two different taken/not-taken counters for the two 
> different "instances" if the tail jump.  After a few iterations to warm 
> up, the predictor will get every jump perfect despite the fact that 
> memcpy() is a library call and the branches would otherwise alias.
> 
> 
> Bringing it back to the code in question.  The "raw" parameter is an 
> explicit true or false at the top of all call paths leading into these 
> functions.  Therefore, an individual branch history has a high 
> correlation with said true or false, irrespective of the absolute code 
> layout.  As a consequence, the correct result of the prediction is 
> highly correlated with the branch history, and it will predict 
> perfectly[1] after a few times the path has been used.

Thanks a lot for the explanation. May I suggest to make this less
ambiguous in the description, e.g. by saying "the raw parameter is a
constant at the root of all call trees"?

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.