[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86: adjust handling of interrupts coming in via legacy vectors

>>> On 14.05.12 at 16:38, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 14/05/12 15:28, Jan Beulich wrote:
>>>>> On 14.05.12 at 15:33, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 14/05/12 13:55, Jan Beulich wrote:
>>>>>>> On 14.05.12 at 14:39, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
>>>>> The debugging code added in c/s 24707:96987c324a4f was hit a (small)
>>>>> number of times (one report being
>>>>> http://lists.xen.org/archives/html/xen-devel/2012-05/msg00332.html),
>>>>> apparently always with a vector within the legacy range. Obviously,
>>>>> besides legacy vectors not normally expected to be in use on systems
>>>>> with IO-APIC(s), they should never make it to the IRQ migration logic.
>>>>> This wasn't being prevented so far: Since we don't have a one-to-one
>>>>> mapping between vectors and IRQs - legacy IRQs may have two vectors
>>>>> associated with them (one used in either 8259A, the other used in one
>>>>> of the IO-APICs) -, vector-to-IRQ translations for legacy vectors (as
>>>>> used in do_IRQ()) would yield a valid IRQ number despite the IRQ
>>>>> really being handled via an IO-APIC.
>>>>> This gets changed here - disable_8259A_irq() zaps the legacy vector-to-
>>>>> IRQ mapping, and enable_8259A_irq(), should it ever be called for a
>>>>> particular interrupts, restores it.
>>>>> Additionally, the spurious interrupt logic in do_IRQ() gets adjusted
>>>>> too: Interrupts coming in via legacy vectors obviously didn't get
>>>>> reported through the IO-APIC/LAPIC pair (as we never program these
>>>>> vectors into any RTE), and hence shouldn't get ack_APIC_irq() called on
>>>>> them. Instead, a new function (pointer) bogus_8259A_irq() gets used to
>>>>> have the 8259A driver take care of the bogus interrupt (as outside of
>>>>> automatice EOI mode it may need an EOI to be issued for it to prevent
>>>>> other interrupts that may legitimately go through the 8259As from
>>>>> getting masked out).
>>>> Note that this patch does not make any attempt at dealing with the
>>>> underlying issue that causes the bogus interrupt(s) to show up. If
>>>> my analysis is right, we shouldn't see crashes anymore, but instead
>>>> observe instances of spurious interrupts on legacy vectors. It would
>>>> certainly be nice to have an actual proof of this (albeit I realize that
>>>> this isn't readily reproducible), in order to then - if indeed behaving
>>>> as expected - add debugging code to identify whether such interrupts
>>>> in fact get raised by one of the 8259A-s (particularly printing the
>>>> cached and physical mask register values), or whether they get
>>>> introduced into the system by yet another obscure mechanism.
>>>> One particular thing I'm suspicious about are the numerous aliases
>>>> to the two (each) 8259A I/O ports that various chipsets have: What
>>>> if some component in Dom0 accesses one of the alias ports in order
>>>> to do something specific to a non-standard platform (say, probe for
>>>> some special hardware interface), not realizing that it actually plays
>>>> with PIC state? Linux under the same conditions wouldn't severely
>>>> suffer - as it has a 1:1 vector <-> IRQ translation, it likely would
>>>> merely observe an extra interrupt.
>>> On the whole, the patch looks sensible, but what happens if the spurious
>>> interrupt is coming in through the Local APIC ?  If this is the case,
>>> then we still need to ACK it, even if it is a bogus PIC interrupt.
>>> Perhaps in irq.c, the changes should check whether the observed vector
>>> has been raised in the LAPIC and ack it, and then decide whether it is
>>> bogus or not.
>> Should that really turn out to be the case, we're in much bigger trouble,
>> as then we need an explanation how an interrupt at that vector could
>> have got raised in the first place. I'd therefore like to keep the current
>> change deal only with things that we know can happen.
> We would be in huge trouble.  As it currently stands, I am not certain
> that we can be sure that this is not happening.
> As a concession, perhaps a test of the LAPIC IIR, and an obvious error
> to the console?  It would be be more useful than having Xen crash/hang
> due to no longer always ack'ing the LAPIC.

Okay, let's do both then (check LAPIC and 8259A). I'll send an updated
patch soon.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.