This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH] 3/3: MCA/MCE correctable error handling

To: Christoph Egger <Christoph.Egger@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] 3/3: MCA/MCE correctable error handling
From: Keir Fraser <keir@xxxxxxxxxxxxx>
Date: Wed, 22 Aug 2007 17:05:44 +0100
Cc: Gavin.Maltby@xxxxxxx, Jan Beulich <jbeulich@xxxxxxxxxx>
Delivery-date: Wed, 22 Aug 2007 09:06:28 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <200708221756.00902.Christoph.Egger@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acfk1ktAid8TLFDJEdyrvwAX8io7RQ==
Thread-topic: [Xen-devel] [PATCH] 3/3: MCA/MCE correctable error handling
User-agent: Microsoft-Entourage/
On 22/8/07 16:56, "Christoph Egger" <Christoph.Egger@xxxxxxx> wrote:

>> What I'm trying to say is that I'd think this should be polled at a much
>> higher frequency (I'd suggest 1Hz), without adjustments. Typically, a
>> healthy system will not encounter problems soon after boot, but after
>> running for perhaps a very long time (and a system in bad condition is
>> likely to encounter problems right away, so wouldn't be affected by
>> changing the polling rate). Thus, in the general case, you'd have a
>> comparably long latency, during which some kind of (automated) action could
>> already be taken to preserve data consistency.
> The polling routine that is in the -unstable tree (the version taken from
> Linux) runs every 15 seconds without adjustments.
> 1Hz causes too much system load for a healthy system IMO.
> That's why I introduced the adjustments with use of hw threshold registers
> to come to a compromise solution.

What's the deal here? Do correctable errors not cause an MCE, yet are still
detected via the machine-check architecture (albeit by a polling method)?

Are there going to be patches on the Linux side to pick up this MCA info?
What is Linux going to do with it, apart from log it (which Xen can already
do itself)? Or is this all Solaris-specific?

 -- Keir

Xen-devel mailing list