[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2 V2] iommu/amd: Workaround for erratum 787



At 11:53 +0100 on 10 Jun (1370865209), Jan Beulich wrote:
> > Oh I see, the idea is that we use the control register to mask
> > interrupts instead of relying on the status register?  That seems
> > better.  But doesn't this IOMMU already mask further interrupts when the
> > pending bit in the status register is set?  I can't see any wording
> > about that in the IOMMU doc but the erratum implies it.  Suravee, do you
> > know if this is the case?
> 
> Actually, the documentation has a subtle but perhaps important
> difference int the wording here: For EventLogInt and ComWaitInt
> is read "An interrupt is generated when <status bit> = 1b and MMIO
> Offset 0018h[<control bit>] = 1b", where as for PPRInt and GAInt
> it says "An interrupt is generated when <status bit> changes from 0b
> to 1b and MMIO Offset 0018h[<control bit>] = 1b".
> 
> So I'd like to be one the safe side and assume further interrupts can
> be generated in all cases.

Fair enough.

 - see also the emulation behavior in
> iommu_guest.c which - afaict - always raises an interrupt, not just on
> a 0 -> 1 transition.

Well, if it were a documented beahviour, we ought to change that. :)

> > In either case, the while () loop worries me; I think it would be better
> > to schedule the tasklet again if we see the bit is set; a 'while
> > (pending is set) { clear pending bit; }' loop might never exit the
> > tasklet at all.
> 
> That could only be due to a hardware bug worse than the one we're
> working around here, and I don't think is worth dealing with.

Well, there's runaway guest hardware.  If we reschedule the tasklet then
pci_check_disable_device() will eventually catch and suppress the
misbehaviour; if we spin here it won't get a chance.  I guess the
argument is that it will eventually overflow the log buffer and stop
setting the log-interrupt bit -- that needs at least a comment.

I was also a bit worried about the erratum-787 event getting delayed
(since there's no interrupt to cause us to actually process it), but I
just realised that's a more general problem: we ought to be resetting
the 'pending' bits _before_ scanning the log, or any new entries that
arrive between the log scan and the RW1C write won't be seen until the
_next_ log entry causes an interrupt.

How about:
 write-to-clear status.pending
 process the log
 if (status.pending)
    reschedule the tasklet
 else
    unmask the interrupt.

Since we have to do the extra read anyway for erratum 787, we might as
well save ourselves the bother of taking an interrupt in the other case.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.