Xen project Mailing List

Re: [PATCH v2] x86/i8259: do not assume interrupts always target CPU0

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 24 Oct 2023 12:14:00 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=B4pDAbM2NuNwAOp5cZ4BqnSbLnPSw7QgU4Gks1MLb/o=; b=igsk1tySomFBjtD2KSUg6GnGA+vPZQF1lTgtOOXoS6vJP4hricbQNQfdTJ0CG4w1jbNeC9skOJlpumPqnu4aLzXO6nxMIXOeqQN+4t3fNE1AHRxlTuCVtV1hckPaZcXtb2b0k4g1o0+gqwZh/5dB8+dZpCEdDD8uz2XkgchihY/qZJmyzzI/9SEK1+/CeoXlf8AXaumBxCHPzG9j5dJabrsoq3vCiWDxmA/v8u4KFwFj/yDUy9bZtXSFbe6slYLLU95b1muBh9MwQ4rVBHXNez8+Z7YGKyWGJ08svBO1KVAqGRhBt2TtNRjQDe6xDx9VeGfTUElJiwZI2xc+Ft227w==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fYMkjT4lT6oZFQakKXtaPgEqy3PQeSbgI/jg9zKFHKGMoHP7FkvvIOUSWixImX0QlxGFcyxbRYie5lP5SgFaeNApYx54NPXVRw9PixBfpKQ+4931JswWNXqGTrn5IRD4kCqhi9ARWf8rUxzuf+TIqodR/tWrg1QQrJfuqRr6TigI7L88xQFSxVyoV1lZ35noVXbEB9iGwNbn5H+kDZ5WFXW0wwC0R6gxIVQdOSdwoSzNsMlxgqpEQ0h6+1u6rsmlHt1mYe+x3v4gIgT/8PSaxovsnP3/V0/3efMPwUmE/ZLWlz8iSDcWhr8xg3qVhOI1iFhdEgHFCbifjXPSoDA9Aw==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 24 Oct 2023 10:14:18 +0000

Ironport-data: A9a23:b2/Oq6OEaiN45W/vrR2ClsFynXyQoLVcMsEvi/4bfWQNrUp01jQPx 2JMWG/TPPbcY2Wjfo13YImyoEsBusPSnN41SAto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CQ6jefQAOOkVIYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGjxSs/nrRC9H5qyo42pA5ABmPJingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0sZQOWZK7 OUIEz1OdR2nttKRnuLnFuY506zPLOGzVG8ekldJ6GmDSNwAGNXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PNxvzC7IA9ZidABNPLPfdOHX4NNl1uwr WPa5WXpRBodMbRzzBLcqCz937aWzXuTtIQ6T76qp6R2kn6q9HVDOS9RUlGYjaa9lRvrMz5YA wlOksY0loAw/kG2Stj2XzWjvWWJ+BUbXrJ4DOkS+AyLjK3O7G6xFmUCCzJMdtEinMs3XiAxk E+EmcvzAj5iu6HTTmiSnop4thu3MCkRaGUEPCkNSFNf58G5+N1ryBXSUtxkDai5yMXvHi39y CyLqy54gKgPickM1OOw+lWvby+Qm6UlhzUdvm3/Nl9JJCsgDGJ5T+REMWTm0Ms=

Ironport-hdrordr: A9a23:JLNmu6gapOQxlxiaCtWH7/9D4nBQXh4ji2hC6mlwRA09TyX5ra 2TdZUgpHrJYVMqMk3I9uruBEDtex3hHP1OkOss1NWZPDUO0VHARO1fBOPZqAEIcBeOldK1u5 0AT0B/YueAd2STj6zBkXSF+wBL+qj6zEiq792usEuEVWtRGsVdB58SMHfiLqVxLjM2YqYRJd 6nyedsgSGvQngTZtTTPAh/YwCSz+e78q4PeHQ9dmca1DU=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Oct 24, 2023 at 11:33:21AM +0200, Jan Beulich wrote: > On 23.10.2023 14:46, Roger Pau Monne wrote: > > Sporadically we have seen the following during AP bringup on AMD platforms > > only: > > > > microcode: CPU59 updated from revision 0x830107a to 0x830107a, date = > > 2023-05-17 > > microcode: CPU60 updated from revision 0x830104d to 0x830107a, date = > > 2023-05-17 > > CPU60: No irq handler for vector 27 (IRQ -2147483648) > > microcode: CPU61 updated from revision 0x830107a to 0x830107a, date = > > 2023-05-17 > > > > This is similar to the issue raised on Linux commit 36e9e1eab777e, where > > they > > observed i8259 (active) vectors getting delivered to CPUs different than 0. > > > > On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt > > descriptors to contain all possible CPUs, so that APs will reserve the > > vector > > at startup if any legacy IRQ is still delivered through the i8259. Note > > that > > if the IO-APIC takes over those interrupt descriptors the CPU mask will be > > reset. > > > > Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected > > even > > when all i8259 pins are masked, and hence would need to be handled on all > > CPUs. > > > > Do not reserve the PIC spurious vectors on all CPUs, but do check for such > > spurious interrupts on all CPUs if the vendor is AMD or Hygon. > > The first half of this sentence reads as if it was describing a change, > but aiui there's no change to existing behavior in this regard anymore. > Maybe use something like "continue to reserve PIC vectors on CPU0 only"? > > > Note that once > > the vectors get used by devices detecting PIC spurious interrupts will no > > longer be possible, however the device should be able to cope with spurious > > interrupt. Such PIC spurious interrupts occurring when the vector is in > > use by > > a local APIC routed source will lead to an extra EOI, which might > > unintentionally clear a different vector from ISR. Note this is already the > > current behavior, so assume it's infrequent enough to not cause real issues. > > > > Finally, adjust the printed message to display the CPU where the spurious > > interrupt has been received, so it looks like: > > > > microcode: CPU1 updated from revision 0x830107a to 0x830107a, date = > > 2023-05-17 > > cpu1: spurious 8259A interrupt: IRQ7 > > microcode: CPU2 updated from revision 0x830104d to 0x830107a, date = > > 2023-05-17 > > > > Fixes: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs') > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > Changes since v1: > > - Do not reserved spurious PIC vectors on APs, but still check for spurious > > PIC interrupts. > > - Reword commit message. > > --- > > Not sure if the Fixes tag is the most appropriate here, since AFAICT this > > is a > > hardware glitch, but it makes it easier to see to which versions the fix > > should > > be backported, because Xen previous behavior was to reserve all legacy > > vectors > > on all CPUs. > > I'm inclined to suggest to (informally) invent an Amends: tag. > > > --- a/xen/arch/x86/i8259.c > > +++ b/xen/arch/x86/i8259.c > > @@ -37,6 +37,15 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq); > > > > bool bogus_8259A_irq(unsigned int irq) > > { > > + if ( smp_processor_id() && > > + !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) > > ) > > + /* > > + * For AMD/Hygon do spurious PIC interrupt detection on all CPUs, > > as it > > + * has been observed that during unknown circumstances spurious PIC > > + * interrupts have been delivered to CPUs different than the BSP. > > + */ > > + return false; > > + > > return !_mask_and_ack_8259A_irq(irq); > > } > > I don't think this function should be changed; imo the adjustment belongs > at the call site. It makes the call site much more complex to follow, in fact I was considering pulling the PIC vector range checks into bogus_8259A_irq(). Would that convince you into placing the check here rather than in the caller context? > > @@ -349,7 +359,22 @@ void __init init_IRQ(void) > > continue; > > desc->handler = &i8259A_irq_type; > > per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq; > > - cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu)); > > + > > + /* > > + * The interrupt affinity logic never targets interrupts to offline > > + * CPUs, hence it's safe to use cpumask_all here. > > + * > > + * Legacy PIC interrupts are only targeted to CPU0, but depending > > on > > + * the platform they can be distributed to any online CPU in > > hardware. > > + * Note this behavior has only been observed on AMD hardware. In > > order > > + * to cope install all active legacy vectors on all CPUs. > > + * > > + * IO-APIC will change the destination mask if/when taking > > ownership of > > + * the interrupt. > > + */ > > + cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor & > > + (X86_VENDOR_AMD | > > X86_VENDOR_HYGON) ? > > + &cpumask_all : cpumask_of(cpu)); > > Nit: Imo this would better be wrapped as > > cpumask_copy(desc->arch.cpu_mask, > boot_cpu_data.x86_vendor & > (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? > &cpumask_all : cpumask_of(cpu)); > > or (considering how we often prefer to wrap conditional operators) > > cpumask_copy(desc->arch.cpu_mask, > boot_cpu_data.x86_vendor & > (X86_VENDOR_AMD | X86_VENDOR_HYGON) > ? &cpumask_all : cpumask_of(cpu)); > > or > > cpumask_copy(desc->arch.cpu_mask, > boot_cpu_data.x86_vendor & > (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all > : cpumask_of(cpu)); > > , possibly with the argument spanning three lines additionally > parenthesized as a whole. > > (Of course an amd_like() construct like we have in the emulator would > further help readability here, but the way that works it cannot be > easily generalized for use outside of the emulator.) I think the last one could be my preferred indentation, will adjust to that. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.