[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/watchdog: Use real timestamps for watchdog timeout

On 24/05/13 10:37, Tim Deegan wrote:
> At 21:32 +0100 on 23 May (1369344726), Andrew Cooper wrote:
>> Do not assume that we will only receive interrupts at a rate of nmi_hz.  On a
>> test system being debugged, I observed a PCI SERR being continuously asserted
>> without the SERR bit being set.  The result was Xen "exceeding" a 300 second
>> timeout within 1 second.
> Sounds like the CPU is indeed stuck, and the watchdog has just optimized
> away the 5 minutes of back-to-back NMIs. :)
> Handling this case it nice, but I wonder whether this patch ought to
> detect and report ludicrous NMI rates rather than silently ignoring
> them.  I guess that's hard to do in an NMI handler, other than by
> adjusting the printk when we crash.
> Tim.

Actually I suspect the system was livelocked with PCI SERRs being issued
from a PCIe switch.  I only have second granularity on the serial
console, but can confirm that cpu0 was perfectly alive and well within
the same second as the watchdog supposedly expiring.

I was considering trying to work around a ludicrous rate of interrupts,
but decided to go for the easier patch first


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.