[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs



On 08/02/18 06:37, Alexey G wrote:
> On Wed, 7 Feb 2018 13:01:08 +0000
> Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote:
>> So far the issue confirmed:
>> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
>> that it was tested on), Intel S2600XX, etc.
>>
>> Also see:
>> https://bugs.xenserver.org/browse/XSO-774
>>
>> Well, no-watchdog is what we currently recommend in that case but we
>> hoped there is a general solution here from Xen side. You have your
>> point that they should fix this on their side because it's their fault
>> indeed. But the user experience is also important for us I think.
> 
> Igor,
> 
> It would be nice to measure the actual SMI handling time on affected
> systems (eg. via rdtsc before/after inb(0x61) + averaging for
> multiple reads perhaps), is it really 10+ ms.
> 

I've done this measurement before. So what we are seeing exactly is that
the time we are spending in SMI is spiking (sometimes up to 200ms) at
the moment we go through INIT-SIPI-SIPI sequence. Looks like this is
enough to push the system into a livelock spiral. So I agree with Jan to
some point that the proposed workaround might not be working on some
systems.

> There might be a chance that perf counter frequency is calculated wrong
> for some systems, resulting in a very high rate of NMI watchdog ticks
> instead of long SMI handler execution time. >10ms just looks... too
> extreme.
> 

We ruled that out.

> Huawei Server 2488 V5 BIOS -- similar SMI I/O trap handler for the port
> 61h found. Some differences with gigabyte H270 system though:
> 
> - no "allocated" I/O traps anymore, but one additional SMI I/O trap
>   encountered: port 900h, dword size. Possibly related to PCIe PM
>   facilities.
> 
> - port 61h SMI handler now has multiple calls to debug/assert stub
>   functions -- there might be a chance that some of impacted systems
>   had debug build on, resulting in those stubs expanded to some real
>   debugging code with negative impact on SMI handling speed.
> 
> Few additional observations:
> 
> - port 61h I/O Trap SMI handler checks accessed I/O address/size to be
>   equal to 61h/1byte. There might be some difference when reading port
>   61h via inw(0x60)/inl(0x60)/etc
> 
> - looks like there exist an alternative way to read NMI status without
>   triggering SMI -- via ports 63h/65h/67h, but this depends on
>   undocumented bit in Generic Control and Status register
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.