[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

On Wed, 7 Feb 2018 13:01:08 +0000
Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote:
>So far the issue confirmed:
>Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
>that it was tested on), Intel S2600XX, etc.
>Also see:
>Well, no-watchdog is what we currently recommend in that case but we
>hoped there is a general solution here from Xen side. You have your
>point that they should fix this on their side because it's their fault
>indeed. But the user experience is also important for us I think.


It would be nice to measure the actual SMI handling time on affected
systems (eg. via rdtsc before/after inb(0x61) + averaging for
multiple reads perhaps), is it really 10+ ms.

There might be a chance that perf counter frequency is calculated wrong
for some systems, resulting in a very high rate of NMI watchdog ticks
instead of long SMI handler execution time. >10ms just looks... too

Huawei Server 2488 V5 BIOS -- similar SMI I/O trap handler for the port
61h found. Some differences with gigabyte H270 system though:

- no "allocated" I/O traps anymore, but one additional SMI I/O trap
  encountered: port 900h, dword size. Possibly related to PCIe PM

- port 61h SMI handler now has multiple calls to debug/assert stub
  functions -- there might be a chance that some of impacted systems
  had debug build on, resulting in those stubs expanded to some real
  debugging code with negative impact on SMI handling speed.

Few additional observations:

- port 61h I/O Trap SMI handler checks accessed I/O address/size to be
  equal to 61h/1byte. There might be some difference when reading port
  61h via inw(0x60)/inl(0x60)/etc

- looks like there exist an alternative way to read NMI status without
  triggering SMI -- via ports 63h/65h/67h, but this depends on
  undocumented bit in Generic Control and Status register

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.