This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: IOMMU faults

To: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Subject: [Xen-devel] Re: IOMMU faults
From: Wei Wang2 <wei.wang2@xxxxxxx>
Date: Thu, 16 Jun 2011 16:30:14 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Allen Kay <allen.m.kay@xxxxxxxxx>, Jean Guyader <Jean.Guyader@xxxxxxxxxx>
Delivery-date: Thu, 16 Jun 2011 07:29:49 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110616092509.GH17634@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20110616092509.GH17634@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.9.6 (enterprise 20070904.708012)
Alberto BozzoOn Thursday 16 June 2011 11:25:09 Tim Deegan wrote:
> Hi, IOMMU maintainers,
> What should Xen do when an IOMMU fault happens?  As far as I can
> see both the AMD and Intel code clears the error in the IOMMU and
> carries on, but I suspect some more vigorous action is appropriate.
> I've seen traces from an Intel machine that seemed to be livelocked on
> IOMMU faults from a passed-through VGA card, until it was killed by the
> watchdog.  I think I can see two things that contribute to that:
>  - The Intel IOMMU fault handler prints quite a lot of info in interrupt
>    context, making it easier to livelock.  Still I think the general
>    problem applies on AMD too.

This info could still be useful for debugging, but we should only enable this 
for debug build. 

>  - Domain destruction re-assigns passed though cards to dom0, but the
>    cards don't seem to get reset.  So there's nothing to stop a card
>    battering away at DMA in the meantime.  That seems like a problem
>    independent of livelock, actually.

There should  be some FLR codes in tools (both xm and xl). But this might not 
work well with some devices...

> In any case, it seems like it would be a good idea to stop a
> broken/malicious/deassigned card from flooding Xen with IOMMU faults.

Yes, agree that. Actually I saw a lot could be improved in the fault handler. 
When iommu faults come from dma error, we should either stop the device from 
doing dma or inject errors to guest if the guest driver is able to handle io 
page fault.

> I was considering just writing 0 to the faulting card's PCI command
> register, but I'm told that's not always enough to properly deactivate
> a card, and it might be a little over-zealous to do it on the first
> offence.
> Ideas?
It seems difficult to find a generic approach to stop a device without knowing 
more device specific details... 

> Tim.

Xen-devel mailing list