|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/i8259: do not assume interrupts always target CPU0
On Tue, Oct 24, 2023 at 11:33:21AM +0200, Jan Beulich wrote:
> On 23.10.2023 14:46, Roger Pau Monne wrote:
> > Sporadically we have seen the following during AP bringup on AMD platforms
> > only:
> >
> > microcode: CPU59 updated from revision 0x830107a to 0x830107a, date =
> > 2023-05-17
> > microcode: CPU60 updated from revision 0x830104d to 0x830107a, date =
> > 2023-05-17
> > CPU60: No irq handler for vector 27 (IRQ -2147483648)
> > microcode: CPU61 updated from revision 0x830107a to 0x830107a, date =
> > 2023-05-17
> >
> > This is similar to the issue raised on Linux commit 36e9e1eab777e, where
> > they
> > observed i8259 (active) vectors getting delivered to CPUs different than 0.
> >
> > On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt
> > descriptors to contain all possible CPUs, so that APs will reserve the
> > vector
> > at startup if any legacy IRQ is still delivered through the i8259. Note
> > that
> > if the IO-APIC takes over those interrupt descriptors the CPU mask will be
> > reset.
> >
> > Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected
> > even
> > when all i8259 pins are masked, and hence would need to be handled on all
> > CPUs.
> >
> > Do not reserve the PIC spurious vectors on all CPUs, but do check for such
> > spurious interrupts on all CPUs if the vendor is AMD or Hygon.
>
> The first half of this sentence reads as if it was describing a change,
> but aiui there's no change to existing behavior in this regard anymore.
> Maybe use something like "continue to reserve PIC vectors on CPU0 only"?
>
> > Note that once
> > the vectors get used by devices detecting PIC spurious interrupts will no
> > longer be possible, however the device should be able to cope with spurious
> > interrupt. Such PIC spurious interrupts occurring when the vector is in
> > use by
> > a local APIC routed source will lead to an extra EOI, which might
> > unintentionally clear a different vector from ISR. Note this is already the
> > current behavior, so assume it's infrequent enough to not cause real issues.
> >
> > Finally, adjust the printed message to display the CPU where the spurious
> > interrupt has been received, so it looks like:
> >
> > microcode: CPU1 updated from revision 0x830107a to 0x830107a, date =
> > 2023-05-17
> > cpu1: spurious 8259A interrupt: IRQ7
> > microcode: CPU2 updated from revision 0x830104d to 0x830107a, date =
> > 2023-05-17
> >
> > Fixes: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs')
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Changes since v1:
> > - Do not reserved spurious PIC vectors on APs, but still check for spurious
> > PIC interrupts.
> > - Reword commit message.
> > ---
> > Not sure if the Fixes tag is the most appropriate here, since AFAICT this
> > is a
> > hardware glitch, but it makes it easier to see to which versions the fix
> > should
> > be backported, because Xen previous behavior was to reserve all legacy
> > vectors
> > on all CPUs.
>
> I'm inclined to suggest to (informally) invent an Amends: tag.
>
> > --- a/xen/arch/x86/i8259.c
> > +++ b/xen/arch/x86/i8259.c
> > @@ -37,6 +37,15 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq);
> >
> > bool bogus_8259A_irq(unsigned int irq)
> > {
> > + if ( smp_processor_id() &&
> > + !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))
> > )
> > + /*
> > + * For AMD/Hygon do spurious PIC interrupt detection on all CPUs,
> > as it
> > + * has been observed that during unknown circumstances spurious PIC
> > + * interrupts have been delivered to CPUs different than the BSP.
> > + */
> > + return false;
> > +
> > return !_mask_and_ack_8259A_irq(irq);
> > }
>
> I don't think this function should be changed; imo the adjustment belongs
> at the call site.
It makes the call site much more complex to follow, in fact I was
considering pulling the PIC vector range checks into
bogus_8259A_irq(). Would that convince you into placing the check here
rather than in the caller context?
> > @@ -349,7 +359,22 @@ void __init init_IRQ(void)
> > continue;
> > desc->handler = &i8259A_irq_type;
> > per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
> > - cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
> > +
> > + /*
> > + * The interrupt affinity logic never targets interrupts to offline
> > + * CPUs, hence it's safe to use cpumask_all here.
> > + *
> > + * Legacy PIC interrupts are only targeted to CPU0, but depending
> > on
> > + * the platform they can be distributed to any online CPU in
> > hardware.
> > + * Note this behavior has only been observed on AMD hardware. In
> > order
> > + * to cope install all active legacy vectors on all CPUs.
> > + *
> > + * IO-APIC will change the destination mask if/when taking
> > ownership of
> > + * the interrupt.
> > + */
> > + cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor &
> > + (X86_VENDOR_AMD |
> > X86_VENDOR_HYGON) ?
> > + &cpumask_all : cpumask_of(cpu));
>
> Nit: Imo this would better be wrapped as
>
> cpumask_copy(desc->arch.cpu_mask,
> boot_cpu_data.x86_vendor &
> (X86_VENDOR_AMD | X86_VENDOR_HYGON) ?
> &cpumask_all : cpumask_of(cpu));
>
> or (considering how we often prefer to wrap conditional operators)
>
> cpumask_copy(desc->arch.cpu_mask,
> boot_cpu_data.x86_vendor &
> (X86_VENDOR_AMD | X86_VENDOR_HYGON)
> ? &cpumask_all : cpumask_of(cpu));
>
> or
>
> cpumask_copy(desc->arch.cpu_mask,
> boot_cpu_data.x86_vendor &
> (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
> : cpumask_of(cpu));
>
> , possibly with the argument spanning three lines additionally
> parenthesized as a whole.
>
> (Of course an amd_like() construct like we have in the emulator would
> further help readability here, but the way that works it cannot be
> easily generalized for use outside of the emulator.)
I think the last one could be my preferred indentation, will adjust to
that.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |