[Xen-devel] IOMMU faults

Hi, IOMMU maintainers,

What should Xen do when an IOMMU fault happens?  As far as I can
see both the AMD and Intel code clears the error in the IOMMU and
carries on, but I suspect some more vigorous action is appropriate.
I've seen traces from an Intel machine that seemed to be livelocked on
IOMMU faults from a passed-through VGA card, until it was killed by the
watchdog.  I think I can see two things that contribute to that:

 - The Intel IOMMU fault handler prints quite a lot of info in interrupt
   context, making it easier to livelock.  Still I think the general
   problem applies on AMD too.
 - Domain destruction re-assigns passed though cards to dom0, but the
   cards don't seem to get reset.  So there's nothing to stop a card
   battering away at DMA in the meantime.  That seems like a problem
   independent of livelock, actually.

In any case, it seems like it would be a good idea to stop a
broken/malicious/deassigned card from flooding Xen with IOMMU faults.

I was considering just writing 0 to the faulting card's PCI command
register, but I'm told that's not always enough to properly deactivate
a card, and it might be a little over-zealous to do it on the first



