[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2 V2] iommu/amd: Workaround for erratum 787



On 6/10/2013 8:59 AM, Jan Beulich wrote:
On 10.06.13 at 15:53, Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx>
wrote:
On 6/10/2013 5:53 AM, Jan Beulich wrote:
On 10.06.13 at 12:40, Tim Deegan <tim@xxxxxxx> wrote:
At 10:47 +0100 on 10 Jun (1370861263), Jan Beulich wrote:
On 10.06.13 at 11:35, Tim Deegan <tim@xxxxxxx> wrote:
At 00:05 -0500 on 10 Jun (1370822751), suravee.suthikulpanit@xxxxxxx wrote:
From: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>

The IOMMU interrupt handling in bottom half must clear the PPR log
interrupt
and event log interrupt bits to re-enable the interrupt. This is done by
writing 1 to the memory mapped register to clear the bit. Due to hardware
bug,
if the driver tries to clear this bit while the IOMMU hardware also setting
this bit, the conflict will result with the bit being set. If the interrupt
handling code does not make sure to clear this bit, subsequent changes in
the
event/PPR logs will no longer generating interrupts, and would result if
buffer overflow. After clearing the bits, the driver must read back
the register to verify.
Is there a risk of livelock here?  That is, if some device is causing a
lot of IOMMU faults, a CPU could get stuck in this loop re-enabling
interrupts as fast as they are raised.

The solution suggested in the erratum seems better: if the bit is set
after clearing, process the interrupts again (i.e. run/schedule the
top-half handler).  That way the bottom-half handler will definitely
terminate and the system can make some progress.
That's what's being done really: The actual interrupt handler disables
the interrupt sources, and the tasklet re-enables them (or at least is
supposed to do so - patch 1 isn't really correct in the respect).
Oh I see, the idea is that we use the control register to mask
interrupts instead of relying on the status register?  That seems
better.  But doesn't this IOMMU already mask further interrupts when the
pending bit in the status register is set?  I can't see any wording
about that in the IOMMU doc but the erratum implies it.  Suravee, do you
know if this is the case?
Actually, the documentation has a subtle but perhaps important
difference int the wording here: For EventLogInt and ComWaitInt
is read "An interrupt is generated when <status bit> = 1b and MMIO
Offset 0018h[<control bit>] = 1b", where as for PPRInt and GAInt
it says "An interrupt is generated when <status bit> changes from 0b
to 1b and MMIO Offset 0018h[<control bit>] = 1b".

So I'd like to be one the safe side and assume further interrupts can
be generated in all cases - see also the emulation behavior in
iommu_guest.c which - afaict - always raises an interrupt, not just on
a 0 -> 1 transition.
The behavior should be that the interrupt will be generated if the "xxxInt"
bit is 0.  Once generated, it will be set to "1" by the hardware.  If this
bit is 1, events will be added to the log but interrupt will not be
generated.
"Should be" isn't enough here, even more so given the mentioned
wording differences in your documentation. We need to know how
actual (past, current, and future) hardware behaves.

Jan

I am sure this is what the behavior of the hardware. Besides, only the hardware can set this bit. I have also tested by not clearing the bit, and basically I did not see additional interrupts.

Suravee


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.