Re: [Xen-devel] [PATCH] x86/watchdog: Use real timestamps for watchdog timeout

On 24/05/13 11:13, Tim Deegan wrote:
> At 10:57 +0100 on 24 May (1369393060), Andrew Cooper wrote:
>> On 24/05/13 08:09, Jan Beulich wrote:
>>> You can't use NOW() here - while the time updating code is safe
>>> against normal interrupts, it's not atomic wrt NMIs.
>> But NMIs are latched at the hardware level.  If we get a nested NMI the
>> Xen will be toast on the exit path anyway.
> The problem is that an NMI can arrive while local_time_calibration() is
> writing its results, so calling NOW() in the NMI handler might return
> garbage. 

Aah - I see.  Sorry - I misunderstood the original point.

Yes - that is an issue.

Two solutions come to mind.

1) Along with the local_irq_disable()/enable() pairs in
local_time_calibration, having an atomic_t indicating "time data update
in progress", allowing the NMI handler to decide to bail early.

2) Modify local_time_calibration() to fill in a shadow cpu_time set, and
a different atomic_t to indicate which one is consistent.  This would
allow the NMI handler to always use one consistent set of timing

>>> Handling this case it nice, but I wonder whether this patch ought to
>>> detect and report ludicrous NMI rates rather than silently ignoring
>>> them.  I guess that's hard to do in an NMI handler, other than by
>>> adjusting the printk when we crash.
> Actually on second thoughts it's easier: as well as having this patch
> (or near equivalent) to avoid premature watchdog expiry, we cna detect
> the NMI rate in, say, the timer softirq and report if it's gone mad.
> Cheers,
> Tim.

I was thinking along that line, but had not yet worked out where to put
it.  That looks like the best place.


