[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MTRR init sequence in Xen


  • To: Jürgen Groß <jgross@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 23 Jan 2026 10:24:38 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pmTpNVi3r6er9zMHlLRWeHxm8slqUa3wab6mLxwHHzc=; b=O9GIXRZWuNSff2qJnlxD+7WcMPNqc2F0mzeGkl4f96AUJGmZu4k1qWhKcMI31nF9hGKbL2W0etqCX/YtpPNLi1j7do5a6Wr4sibJqaO0H1Sasu4QRPSRuuMG7wjE/CEG7VMsaF3SjVJrl8WbHIR3pxGzMjUqwkdLUhVs0nqrtRb5TwSQ/19eiusqM/oAyRcWZyvs+SEBjXSkoiQKL/CtO6P/9FECAJW2vi38CQsP2UYGtQKQ282vuwOyVFlParbxr+dDhG7e5X/eP8hrv98d1KN6EaeffOJ1N+/H7RVOelf1iJntoTKiTJH0XcBFCKyJ002pKWm/dpfVrLgywavzFA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=YcxBNtmW7t3mn1NlRSzkJQz92X6xBIHWGkdf60GOpfj0+zW/YlXALtRYU49Oa5yqhde/6h8yGvTXG/P3TJj4QaV5WofhFltzT61XrADjFuJqKir3mBZvmV15xKiYEgWQSuXjyCcGMD6cEGWRA/eFH3a0OPJH6rCc+nJ4fBdOBRdFKniseDXxhRvhs7BpT0YDlFBXM6dsY06EbdUh94aridZy7pdmLd5sfZDnhhpYaFon2quiFwjFaGj0OHypCeFiOkjotFI4EgiqfYcAlUwnAHaFLSGNnrqt8jlNwZNW9yW2iRkLS4elZAsDXQoNAbhzPLRlcBlhsKnpt9Wqb1cMDw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Fri, 23 Jan 2026 09:25:01 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jan 22, 2026 at 08:24:11PM +0100, Jürgen Groß wrote:
> On 22.01.26 18:36, Roger Pau Monné wrote:
> > On Thu, Jan 22, 2026 at 05:21:12PM +0000, Andrew Cooper wrote:
> > > On 22/01/2026 3:56 pm, Jürgen Groß wrote:
> > > > Just as a heads up: a hardware partner of SUSE has seen hard lockups
> > > > of the Linux kernel during boot on a new machine. This machine has
> > > > 8 NUMA nodes and 960 CPUs. The hang occurs in roughly 1.5% of the boot
> > > > attempts in MTRR initialization of the APs.
> > > > 
> > > > I have sent a small patch series to LKML which seems to fix the problem:
> > > > https://lore.kernel.org/lkml/20260121141106.755458-1-jgross@xxxxxxxx/
> > > > 
> > > > As Xen MTRR handling is taken from the Linux kernel, I guess the same
> > > > problem could happen in Xen, too.
> > > > 
> > > > As the hang always occurred while waiting for the lock, which is
> > > > serializing the single CPUs doing MTRR initialization, my solution was
> > > > to eliminate the lock, allowing all APs to init MTRRs in parallel.
> > > > 
> > > > Maybe we want to do the same in Xen.
> > > 
> > > I suspect Xen might be insulated by the fact that we don't have parallel
> > > AP start (yet), so we don't have the whole system competing on the
> > > spinlock at once.
> > 
> > Oh, I think I've misunderstood the issue.  Linux is doing MTRR init in
> > the AP startup path, and so if it takes too long Linux will report
> > that the AP has failed to start.
> 
> No, Linux is deferring the MTRRs until all APs are up, just like Xen
> (or Xen does it like Linux).
> 
> > 
> > This is not an issue on Xen because MTRR initialization is deferred
> > until all APs are up, and hence is not part of the timed AP start
> > path.  This optimization was done in:
> > 
> > 0d22c8d92c6c x86: CPU synchronization while doing MTRR register update
> > 
> > So even if we did parallel AP startup we won't likely be affected,
> > because we would still defer the MTRR setup until all APs are up.
> 
> We will be affected, as its the deferred MTRR setup which is the
> problem.

If it's the watchdog NMI then than won't be possible on Xen, as the
watchdog is setup after the MTRR synchronization step.  We should
however fix it even if it's not a fatal issue on Xen.  I assume the
avoidance of locking will make a very noticeable performance
difference in boot times.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.