[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86: adjust handling of interrupts coming in via legacy vectors
>>> On 14.05.12 at 16:38, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 14/05/12 15:28, Jan Beulich wrote: >>>>> On 14.05.12 at 15:33, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >>> On 14/05/12 13:55, Jan Beulich wrote: >>>>>>> On 14.05.12 at 14:39, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: >>>>> The debugging code added in c/s 24707:96987c324a4f was hit a (small) >>>>> number of times (one report being >>>>> http://lists.xen.org/archives/html/xen-devel/2012-05/msg00332.html), >>>>> apparently always with a vector within the legacy range. Obviously, >>>>> besides legacy vectors not normally expected to be in use on systems >>>>> with IO-APIC(s), they should never make it to the IRQ migration logic. >>>>> >>>>> This wasn't being prevented so far: Since we don't have a one-to-one >>>>> mapping between vectors and IRQs - legacy IRQs may have two vectors >>>>> associated with them (one used in either 8259A, the other used in one >>>>> of the IO-APICs) -, vector-to-IRQ translations for legacy vectors (as >>>>> used in do_IRQ()) would yield a valid IRQ number despite the IRQ >>>>> really being handled via an IO-APIC. >>>>> >>>>> This gets changed here - disable_8259A_irq() zaps the legacy vector-to- >>>>> IRQ mapping, and enable_8259A_irq(), should it ever be called for a >>>>> particular interrupts, restores it. >>>>> >>>>> Additionally, the spurious interrupt logic in do_IRQ() gets adjusted >>>>> too: Interrupts coming in via legacy vectors obviously didn't get >>>>> reported through the IO-APIC/LAPIC pair (as we never program these >>>>> vectors into any RTE), and hence shouldn't get ack_APIC_irq() called on >>>>> them. Instead, a new function (pointer) bogus_8259A_irq() gets used to >>>>> have the 8259A driver take care of the bogus interrupt (as outside of >>>>> automatice EOI mode it may need an EOI to be issued for it to prevent >>>>> other interrupts that may legitimately go through the 8259As from >>>>> getting masked out). >>>> Note that this patch does not make any attempt at dealing with the >>>> underlying issue that causes the bogus interrupt(s) to show up. If >>>> my analysis is right, we shouldn't see crashes anymore, but instead >>>> observe instances of spurious interrupts on legacy vectors. It would >>>> certainly be nice to have an actual proof of this (albeit I realize that >>>> this isn't readily reproducible), in order to then - if indeed behaving >>>> as expected - add debugging code to identify whether such interrupts >>>> in fact get raised by one of the 8259A-s (particularly printing the >>>> cached and physical mask register values), or whether they get >>>> introduced into the system by yet another obscure mechanism. >>>> >>>> One particular thing I'm suspicious about are the numerous aliases >>>> to the two (each) 8259A I/O ports that various chipsets have: What >>>> if some component in Dom0 accesses one of the alias ports in order >>>> to do something specific to a non-standard platform (say, probe for >>>> some special hardware interface), not realizing that it actually plays >>>> with PIC state? Linux under the same conditions wouldn't severely >>>> suffer - as it has a 1:1 vector <-> IRQ translation, it likely would >>>> merely observe an extra interrupt. >>> On the whole, the patch looks sensible, but what happens if the spurious >>> interrupt is coming in through the Local APIC ? If this is the case, >>> then we still need to ACK it, even if it is a bogus PIC interrupt. >>> >>> Perhaps in irq.c, the changes should check whether the observed vector >>> has been raised in the LAPIC and ack it, and then decide whether it is >>> bogus or not. >> Should that really turn out to be the case, we're in much bigger trouble, >> as then we need an explanation how an interrupt at that vector could >> have got raised in the first place. I'd therefore like to keep the current >> change deal only with things that we know can happen. > > We would be in huge trouble. As it currently stands, I am not certain > that we can be sure that this is not happening. > > As a concession, perhaps a test of the LAPIC IIR, and an obvious error > to the console? It would be be more useful than having Xen crash/hang > due to no longer always ack'ing the LAPIC. Okay, let's do both then (check LAPIC and 8259A). I'll send an updated patch soon. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |