[Xen-devel] Re: [PATCH][FIX] Possible fix for spurious interrupts
On 15 Apr 2006, at 19:57, Arun Sharma wrote:
I think you guys are running into the same problem that FreeBSD ran
into on some Intel motherboards more than a year ago.
This explanation seems to make the most sense.
Because the problem happened on FreeBSD (which masks ioapic RTEs to
implement interrupt threads) and not on Linux, it was hard to get
attention from the hardware guys back then. I had suspected Xen
would run into it sooner or later.
Thanks Arun, this is very informative although unfortunately not very
helpful. Matt Dillon's suggested alternatives to masking do not really
work as they all cause spurious interrupts. Do you know if they ever
found a good fix, or do they live with the problem?
I'd not heard of boot interrupt mode before, but it sounds like many
chipsets cannot disable it and, even when it can be disabled, the
method is chipset specific. The Intel legacy INTx model is so
unbelievably crap. At least source-vectored interrupts are becoming
Anyway, this the current status of my workaround for Xen:
1. I added a new ioapic ack method which delays EOI until after ISR
processing in the driver domain. This mode is enabled by default but
can be disabled with 'ioapic_ack=old' as a Xen boot parameter.
2. The code to safely manage deferred EOI is quite complicated and has
* Must EOI on the CPU that received the interrupt
* Must EOI in 'reverse' order when interrupts have nested
* Un-EOIed interrupts block other guest-bound interrupts which
happen to have lower priority
* Right now, disable_irq() in a driver domain may potentially lock
up all interrupt sources as it may prevent EOI ever happening (until
enable_irq() or the interrupt is unbound from the domain)
* All Xen-bound interrupts have strictly higher priority than any
guest-bound IO-APIC interrupt. This should avoid deadlock issues.
Really it's a messy solution. I think having both old and new ack
methods makes sense, but I'm not sure how we will end up picking which
to use automatically. Maybe using the old method is best, and let users
pick the new one if they see spurious interrupt problems. Or maybe the
problems with the new method are mostly theoretical and we should use
that by default. Or maybe we should have a DMI table to pick between
them. I'm not sure.
Another question is whether to put this in 3.0.2. I think it definitely
needs more testing before that, but it might not make sense to do so at
all as the patch is quite invasive.
Xen-devel mailing list