[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86: adjust handling of interrupts coming in via legacy vectors
On 14/05/12 15:28, Jan Beulich wrote: >>>> On 14.05.12 at 15:33, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >> On 14/05/12 13:55, Jan Beulich wrote: >>>>>> On 14.05.12 at 14:39, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: >>>> The debugging code added in c/s 24707:96987c324a4f was hit a (small) >>>> number of times (one report being >>>> http://lists.xen.org/archives/html/xen-devel/2012-05/msg00332.html), >>>> apparently always with a vector within the legacy range. Obviously, >>>> besides legacy vectors not normally expected to be in use on systems >>>> with IO-APIC(s), they should never make it to the IRQ migration logic. >>>> >>>> This wasn't being prevented so far: Since we don't have a one-to-one >>>> mapping between vectors and IRQs - legacy IRQs may have two vectors >>>> associated with them (one used in either 8259A, the other used in one >>>> of the IO-APICs) -, vector-to-IRQ translations for legacy vectors (as >>>> used in do_IRQ()) would yield a valid IRQ number despite the IRQ >>>> really being handled via an IO-APIC. >>>> >>>> This gets changed here - disable_8259A_irq() zaps the legacy vector-to- >>>> IRQ mapping, and enable_8259A_irq(), should it ever be called for a >>>> particular interrupts, restores it. >>>> >>>> Additionally, the spurious interrupt logic in do_IRQ() gets adjusted >>>> too: Interrupts coming in via legacy vectors obviously didn't get >>>> reported through the IO-APIC/LAPIC pair (as we never program these >>>> vectors into any RTE), and hence shouldn't get ack_APIC_irq() called on >>>> them. Instead, a new function (pointer) bogus_8259A_irq() gets used to >>>> have the 8259A driver take care of the bogus interrupt (as outside of >>>> automatice EOI mode it may need an EOI to be issued for it to prevent >>>> other interrupts that may legitimately go through the 8259As from >>>> getting masked out). >>> Note that this patch does not make any attempt at dealing with the >>> underlying issue that causes the bogus interrupt(s) to show up. If >>> my analysis is right, we shouldn't see crashes anymore, but instead >>> observe instances of spurious interrupts on legacy vectors. It would >>> certainly be nice to have an actual proof of this (albeit I realize that >>> this isn't readily reproducible), in order to then - if indeed behaving >>> as expected - add debugging code to identify whether such interrupts >>> in fact get raised by one of the 8259A-s (particularly printing the >>> cached and physical mask register values), or whether they get >>> introduced into the system by yet another obscure mechanism. >>> >>> One particular thing I'm suspicious about are the numerous aliases >>> to the two (each) 8259A I/O ports that various chipsets have: What >>> if some component in Dom0 accesses one of the alias ports in order >>> to do something specific to a non-standard platform (say, probe for >>> some special hardware interface), not realizing that it actually plays >>> with PIC state? Linux under the same conditions wouldn't severely >>> suffer - as it has a 1:1 vector <-> IRQ translation, it likely would >>> merely observe an extra interrupt. >> On the whole, the patch looks sensible, but what happens if the spurious >> interrupt is coming in through the Local APIC ? If this is the case, >> then we still need to ACK it, even if it is a bogus PIC interrupt. >> >> Perhaps in irq.c, the changes should check whether the observed vector >> has been raised in the LAPIC and ack it, and then decide whether it is >> bogus or not. > Should that really turn out to be the case, we're in much bigger trouble, > as then we need an explanation how an interrupt at that vector could > have got raised in the first place. I'd therefore like to keep the current > change deal only with things that we know can happen. We would be in huge trouble. As it currently stands, I am not certain that we can be sure that this is not happening. As a concession, perhaps a test of the LAPIC IIR, and an obvious error to the console? It would be be more useful than having Xen crash/hang due to no longer always ack'ing the LAPIC. > >> Might it also be sensible to remove dom0's permissions to use the PIC >> ports, in case it is some weird issue like that? > That's already being done iirc. The problem is that it's non-trivial (and > perhaps non-reliable) to determine the aliases, and hence we can't > blindly remove more than the two real ports from Dom0's permitted > set. > > Jan Ah yes - in which case its not feasible. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |