[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

On 05/02/18 21:18, Igor Druzhinin wrote:
> We're noticing a reproducible system boot hang on certain
> post-Skylake platforms where the BIOS is configured in

Its just a plain Skylake Server, from what I can see.

> legacy boot mode with x2APIC disabled. The system stalls
> immediately after writing the first SMP initialization
> sequence into APIC ICR.
> The cause of the problem is watchdog NMI handler execution -
> somewhere near the end of NMI handling (after it's already
> rescheduled the next NMI) it tries to access IO port 0x61
> to get the actual NMI reason on CPU0. Unfortunately, this
> port is emulated by BIOS using SMIs and this emulation
> apparently might take more than we expect under certain
> conditions. As the result, the system is constantly moving
> between NMI and SMI handler and not making any progress.
> Just lower the initial frequency for now as we lower it later
> even more anyway.
> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>

Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

I can independently confirm these findings, and that the fix works.  The
NMI watchdog setup is rather crazy and complicated, but lets not get
into that rats nest here. */

> ---
>  xen/arch/x86/nmi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
> index d7fce28..1eb2a32 100644
> --- a/xen/arch/x86/nmi.c
> +++ b/xen/arch/x86/nmi.c
> @@ -34,7 +34,8 @@
>  #include <asm/apic.h>
>  unsigned int nmi_watchdog = NMI_NONE;
> -static unsigned int nmi_hz = HZ;
> +/* initial watchdog frequency - shouldn't be too high to avoid boot hangs */
> +static unsigned int nmi_hz = HZ / 10;
>  static unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */
>  static unsigned int nmi_p4_cccr_val;
>  static DEFINE_PER_CPU(struct timer, nmi_timer);

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.