Xen project Mailing List

Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry()

To: Andrew Cooper <amc96@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 18 Nov 2021 10:07:33 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=q8xtDLJmzjvodvz6QGmljCHX2yrkP0d2yAIfuv20+EA=; b=lSsyK43B08B0neGQ5canwaXv3CwDjmO0MvWCA1vPIWxOpOliJGYH8FrHT/8V41VzFgt6TJhFQVqBchu8YNE1C5WQBOGj2wFy/df3G/UlIE5BAvWLYfy7VoPNsj694PbBZxL7xaS/iQlBemuv587HVB7jOP25DxjnvqaKY6+UtuYqCnRiiQTqm0qmnS8xmOyY/yEBljK1kWI9TATOwgmrEfsHVojGI10dXEoScVCM9w6T57RSkJ8c424p/7IUqk4AxQzV2ujaQ5k7v5DdORL9nRFSJWP/4u4uu2+66cioxViPwF6h6Cp0FCuTGAaxrsWA8mtgeq1i6omjqxtchWzIUw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OMyq7d48hsBbYFdhfSMpFoZKVDpNBUtVf4fYi4NiHl6fpA446jtxCzQx+jj/ZcsLC/C1ek6iYTANz8OAS3pivo9e5P/SR6Cir+hYsx6T2Rmp0SkdcRsLWe2rcjLil+Clzh7I88ptzfK+Q8nxpjqDEO95EFiCFStJY7xdd1JNecDYx3MvJXla35kyA1ImIRX1jBL/R19p7bd7VNO4cGMPoQjwjYUSxegh2LgY8EpxuETnQ5lZhRYMgDcoqvcYnhJTiS3KMmLrpFuEi15VBvIZGfnGLoMViTrPKOosE3D2LEfTrk+S5hXizz2v5ezAauvVatRYn93lKHudQm+qOa0WlQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 18 Nov 2021 09:07:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18.11.2021 10:06, Jan Beulich wrote: > On 18.11.2021 01:32, Andrew Cooper wrote: >> On 12/11/2021 10:43, Jan Beulich wrote: >>> On 11.11.2021 18:57, Andrew Cooper wrote: >>>> Function pointers are expensive, and the raw parameter is a constant from >>>> all >>>> callers, meaning that it predicts very well with local branch history. >>> The code change is fine, but I'm having trouble with "all" here: Both >>> functions aren't even static, so while callers in io_apic.c may >>> benefit (perhaps with the exception of ioapic_{read,write}_entry(), >>> depending on whether the compiler views inlining them as warranted), >>> I'm in no way convinced this extends to the callers in VT-d code. >>> >>> Further ISTR clang being quite a bit less aggressive about inlining, >>> so the effects might not be quite as good there even for the call >>> sites in io_apic.c. >>> >>> Can you clarify this for me please? >> >> The way the compiler lays out the code is unrelated to why this form is >> an improvement. >> >> Branch history is a function of "the $N most recently taken branches". >> This is because "how you got here" is typically relevant to "where you >> should go next". >> >> Trivial schemes maintain a shift register of taken / not-taken results. >> Less trivial schemes maintain a rolling hash of (src addr, dst addr) >> tuples of all taken branches (direct and indirect). In both cases, the >> instantaneous branch history is an input into the final prediction, and >> is commonly used to select which saturating counter (or bank of >> counters) is used. >> >> Consider something like >> >> while ( cond ) >> { >> memcpy(dst1, src1, 64); >> memcpy(dst2, src2, 7); >> } >> >> Here, the conditional jump inside memcpy() coping with the tail of the >> copy flips result 50% of the time, which is fiendish to predict for. >> >> However, because the branch history differs (by memcpy()'s return >> address which was accumulated by the call instruction), the predictor >> can actually use two different taken/not-taken counters for the two >> different "instances" if the tail jump. After a few iterations to warm >> up, the predictor will get every jump perfect despite the fact that >> memcpy() is a library call and the branches would otherwise alias. >> >> >> Bringing it back to the code in question. The "raw" parameter is an >> explicit true or false at the top of all call paths leading into these >> functions. Therefore, an individual branch history has a high >> correlation with said true or false, irrespective of the absolute code >> layout. As a consequence, the correct result of the prediction is >> highly correlated with the branch history, and it will predict >> perfectly[1] after a few times the path has been used. > > Thanks a lot for the explanation. May I suggest to make this less > ambiguous in the description, e.g. by saying "the raw parameter is a > constant at the root of all call trees"? Oh, forgot to say that then: Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.