This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH] 3/3: MCA/MCE correctable error handling

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] [PATCH] 3/3: MCA/MCE correctable error handling
From: "Christoph Egger" <Christoph.Egger@xxxxxxx>
Date: Thu, 23 Aug 2007 08:57:28 +0200
Cc: Gavin.Maltby@xxxxxxx, Keir Fraser <keir@xxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxxxx>
Delivery-date: Wed, 22 Aug 2007 23:58:39 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C2F21E80.1476F%keir@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C2F21E80.1476F%keir@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.9.6
On Wednesday 22 August 2007 18:10:24 Keir Fraser wrote:
> On 22/8/07 17:05, "Keir Fraser" <keir@xxxxxxxxxxxxx> wrote:
> >> The polling routine that is in the -unstable tree (the version taken
> >> from Linux) runs every 15 seconds without adjustments.
> >> 1Hz causes too much system load for a healthy system IMO.
> >> That's why I introduced the adjustments with use of hw threshold
> >> registers to come to a compromise solution.
> >
> > What's the deal here? Do correctable errors not cause an MCE, yet are
> > still detected via the machine-check architecture (albeit by a polling
> > method)?

The deal here is, detect correctable errors via polling und uncorrectable 
errors via MCE.
This patchset is about correctable errors.

> > Are there going to be patches on the Linux side to pick up this MCA info?
> > What is Linux going to do with it, apart from log it (which Xen can
> > already do itself)? Or is this all Solaris-specific?

The general idea is the Dom0 picks up this MCA info and a) uses
the error-handling infrastructure provided for the non-virtualized form
and b) will use hypercalls to tell xen to also report MCA to a DomU and/or
kill a DomU.
Some hw features for self-healing can only use Dom0 (because registers
sit in the PCI extended config space, Xen doesn't have access to) and some
can use Xen itself.

I wrote a demo driver that mainly tests that the Dom0 actually receives the
MCA info for NetBSD/Xen (Sun prefers to look into BSD licensed code).
It should be easy to port it to Linux.

> Oh, and is AMD-specific code really needed in non-fatal.c? I though the MCA
> stuff was architectural now rather than vendor specific? If there are
> vendor-specific extensions then they belong in the vendor's .c file.

AMD-specific is the use of the hw register code. Intel has some additional 
machine check MSR's containing the register set. Intel may add a structure
to patch 2/3 that make use of them. Should I move the amd polling handler to
amd.c ?

AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy

Xen-devel mailing list