[Xen-devel] Re: [PATCH][FIX] Possible fix for spurious interrupts
>>> On Thu, Apr 20, 2006 at 3:35 AM, in message
<41b22a1a50ce8da0b406f088e78d14f1@xxxxxxxxxxxx>, Keir Fraser
> I am certain that we have been hitting *at least* this issue. Maybe
> there are others, but I'll be surprised if you continue to hit
> interrupt problems with the workaround in place.
> Here is my understanding of this boot interrupt mode and why it
> 1. PCI- X bridge interrupt inputs are not configurable directly as
> inputs to the 8259 PICs (I/O hubs usually allow that for normal PCI
> interrupt inputs)
> 2. So any OS not aware of APICs cannot get at those interrupt lines
> unless a hardware hack is introduced
True. But it's worse than that. Even before PCI X some systems only
routed some interrupts to a specific GSIs > 15. These devices were never
available to a legacy OS at all. Only if the OS was in full symmetric
IO mode. These systems are rare however and devices that fall into this
category should never be setup by the manufacturer as boot devices so
its not an issue unless your trying to run a legacy OS.
> 3. Intel introduce a 'boot interrupt' line by wire- ORing together
> the interrupt inputs to an IO- APIC that services PCI- X interrupts,
> forwarding the resulting 'boot interrupt' line to some INTx line or
> on the 8259
TRUE, but my understanding was that its not necessarily just one line.
>(note: I'm unsure where this boot interrupt actually gets
> wired to).
It's manufacturer dependent and you don't have to care. See PCI config
space discussion below.
> 4. I guess that if you have more than one PCI- X IO- APIC, they may
> forward their boot interrupts to different INTx lines and so appear
> different interrupts to the OS.
TRUE. There is lots of IRQ sharing until you switch to symmetric IO
mode. On some systems the event that causes the firmware to reroute
interrupts is when the OS unmasks or masks an IO APIC RTE entry. (Sorry
I couldn't remember that earlier I would have saved you some grief.)
> What I'm not sure about:
> 1. How are legacy OSes supposed to detect these devices and know
> the interrupt lines are routed to a particular 8259 pin?
For PCI the BIOS or utility modifying the interrupt route is required
to write the legacy IRQ information into the "Interrupt Line" register
of the Devices Configuration Space Header. Values of 0x00 through 0x0F
are valid. 0xFF is also valid and means that there is no interrupt
route or that the device does use interrupts.
> 2. Why does this alias with an IOAPIC0 input?
In my opinion once you've switched to full symmetric mode it shouldn't.
But according to ACPI Systems that support both APIC and dual 8259
interrupt models must map global system interrupts 0-15 to the 8259 IRQs
0-15, except where Interrupt Source Overrides are provided as is almost
always the case with the 8254 timer IRQ0 which almost always goes to
GSI2 (IOAPIC0 INTIN2).
This aliasing issue can get rather messy. Since you don't know if dual
routes exist once interrupts are fanned out to other IOAPIC lines (other
than GSI 0-15) you have to assume the duplicate routes can and do exist.
My experience tell me that they almost always exist.
If duplicate routes don't exist then the BIOS can leave some interrupts
on the legacy IRQ and still pull others off to a separate IO APIC line
otherwise they all have to stay on the legacy GSI or they all have to
move somewhere else. This is why we have to trust MPS or ACPI to be
correct. Its usually when a PCI interrupt entry in ACPI or MPS is not
properly described that you run into this problem.
> Perhaps the boot
> interrupt fires on an INTx line that is then routed to both 8259 and
TRUE in most cases.
> so you get it on whichever happens to be enabled,
> and it
> ends up looking like an interrupt from whichever device is bound to
> that INTx line.
Yes, and the machine will may lock up at that point due to an pending
interrupt that never get serviced and keeps firing over and over.
> Or perhaps the 8259 output is wired into IOAPIC0. I'm
> not sure.
The output of the 8259 is typically routed to INTIN0 of IOAPIC0 or
directly to the processor local APIC LINTIN0. MPS was kind enough to
allow the BIOS to describe the available routes. ACPI was less helpful
in describing virtual wire mode routes.
So with ACPI if you need a 8259 virtual wire mode route you choose one
and then look for the 8254 timer interrupt coming through. Or you can
refer to MPS tables. The 8254 timer is typically routed to INTIN2 of
IOAPIC0 but be aware that some early systems failed to route the 8254
timer directly to IOAPIC0 INTIN2 so mixed mode operation (IOACPI/PIC)
would be required. There are a whole set of issues related to mixed
mode. Don't go there unless you have to.
> The workaround shouldn't hurt performance, assuming interrupts are
> usually delivered to the CPU on which they are serviced.
It will hurt performance. How much? I don't know. Leaving an
interrupt "in service" until the right domain is scheduled to service
the interrupt will cause all interrupts of a equal or lower class
priority on that processor to wait until EOI. Further interrupts of the
same class will also be held off. So never put IO interrupts in the
same APIC interrupt class as timer and IPIs.
I would recommend that for performance reasons we might want to only
apply this fix for systems which have GSIs > 15. Meaning that for
systems with only one IO APIC with only 16 lines we may be okay without
There may be other reasons for not masking IO APICs besides this
interrupt routing issue I just can't remember what they are. I couldn't
remember this whole duplicate route issue until you figured it out based
on my suggestion to Jan Beulich that I was pretty sure it was masking
related. The other issue I do remember is that sometimes we saw level
triggered interrupt assertions get lost as a result of masking. Not
sure why. Hence I really don't like this idea of touching the IO APIC
at all after its once setup.
If a spurious interrupt problems persists you might try just flipping
bit 16 instead of both 15 and 16 when you mask/unmask GSIs 0-15. But I
am not sure this is a correct thing to do for newer chipsets. Flipping
both 15 and 16 may be correct.
Be aware that an interrupt could be in flight even though masked.
Never mask an EXTINT entry if you have any. EXTINT is treated like an
edge triggered interrupt. Mask the inputs to the 8259 instead.
> confident of its correctness, assuming driver domains service their
> interrupts in a reasonably timely fashion (a shame to have to make
> assumption, but once we have MSI support we can recommend people use
> that if they want better driver- domain isolation).
Agreed. But there are lots or reasons for spurious interrupts beside
duplicate routes. The main thing is that we don't get too many Linux
shuts them off.
Hope at least some of this helps.
Clyde R. Griffin
Virtualization Platform Team
> -- Keir
> Xen- devel mailing list
> Xen- devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen- devel
Xen-devel mailing list