This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: [PATCH][FIX] Possible fix for spurious interrupts

To: Arun Sharma <arun@xxxxxxxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH][FIX] Possible fix for spurious interrupts
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Sun, 16 Apr 2006 10:10:09 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date: Sun, 16 Apr 2006 02:14:06 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <20060415185746.GA78500@xxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <d5bbd21c73db9bb960c3b63a90d24ec2@xxxxxxxxxxxx> <20060415185746.GA78500@xxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On 15 Apr 2006, at 19:57, Arun Sharma wrote:

I think you guys are running into the same problem that FreeBSD ran
into on some Intel motherboards more than a year ago.

This explanation seems to make the most sense.


Because the problem happened on FreeBSD (which masks ioapic RTEs to
implement interrupt threads) and not on Linux, it was hard to get
attention from the hardware guys back then. I had suspected Xen
would run into it sooner or later.

Thanks Arun, this is very informative although unfortunately not very helpful. Matt Dillon's suggested alternatives to masking do not really work as they all cause spurious interrupts. Do you know if they ever found a good fix, or do they live with the problem?

I'd not heard of boot interrupt mode before, but it sounds like many chipsets cannot disable it and, even when it can be disabled, the method is chipset specific. The Intel legacy INTx model is so unbelievably crap. At least source-vectored interrupts are becoming more common.

Anyway, this the current status of my workaround for Xen:
1. I added a new ioapic ack method which delays EOI until after ISR processing in the driver domain. This mode is enabled by default but can be disabled with 'ioapic_ack=old' as a Xen boot parameter. 2. The code to safely manage deferred EOI is quite complicated and has some weaknesses:
     * Must EOI on the CPU that received the interrupt
     * Must EOI in 'reverse' order when interrupts have nested
* Un-EOIed interrupts block other guest-bound interrupts which happen to have lower priority * Right now, disable_irq() in a driver domain may potentially lock up all interrupt sources as it may prevent EOI ever happening (until enable_irq() or the interrupt is unbound from the domain) * All Xen-bound interrupts have strictly higher priority than any guest-bound IO-APIC interrupt. This should avoid deadlock issues.

Really it's a messy solution. I think having both old and new ack methods makes sense, but I'm not sure how we will end up picking which to use automatically. Maybe using the old method is best, and let users pick the new one if they see spurious interrupt problems. Or maybe the problems with the new method are mostly theoretical and we should use that by default. Or maybe we should have a DMI table to pick between them. I'm not sure.

Another question is whether to put this in 3.0.2. I think it definitely needs more testing before that, but it might not make sense to do so at all as the patch is quite invasive.

 -- Keir

Xen-devel mailing list