WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

RE: [Xen-users] MCE logging (fwd)

To: "Heiko Lehmann" <hlehmann@xxxxxxxxxxxxx>, xen-users@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-users] MCE logging (fwd)
From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Date: Thu, 18 Jan 2007 14:47:09 +0100
Delivery-date: Thu, 18 Jan 2007 05:47:02 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <Pine.LNX.4.53.0701171711460.16621@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acc7BVYixZgun5SjRGWKKJNIN52p2AAAHZFQ
Thread-topic: [Xen-users] MCE logging (fwd)
 

> -----Original Message-----
> From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Heiko Lehmann
> Sent: 17 January 2007 16:13
> To: xen-users@xxxxxxxxxxxxxxxxxxx
> Subject: [Xen-users] MCE logging (fwd)
> 
> 
> 
> Hallo folks!
> 
> - MCE-message seen only with "xm dmesg", this was not logged.
>   Why?
>   This messages should be forwarded to syslog.

That's a good point. Maybe you should file a bug at
http://bugzilla.xensource.com/bugzilla/index.cgi (search for a similar
one first, MCE shouldn't be very common, so it's probably good enough to
search for that). 


> 
> - http://lists.debian.org/debian-user-german/2006/10/msg02643.html
>   says: CPU nearly dead.

MCE's aren't telling you which part of the system is failing, just the
fact that it's failing. 

Replacing the CPU because that's the part that issues the exception is
"shooting the messenger". 

This error is caused by a memory access error (I haven't tried to decode
the entire error status, it will probably tell you more about what went
wrong), and it can be caused by memory, motherboard or CPU errors. 

>From my limited understanding of german, the above link correctly states
that this is (generally) not a Kernel error, but rather a hardware
problem. 

If it's a correctable problem, there's really no reason to worry too
much (unless you have PLENTY of them, in which case sooner or later it
will either be uncorrectable, or worse, undetected errors in the system,
which will hang/crash, or perhaps just corrupt your data). 

[At AMD tech support, we get a few of these every month, with people
thinking that it's a broken CPU - it MAY be, but you really need to
figure out which component it is that is causing the problem, as the CPU
is often working perfectly fine...]

> 
>   So it seems that CPU 1 & 3 are affected.
>   But only 2 CPUs (+HT) in the box.
>   How is the logical sequence in /proc/cpuinfo?
>   Example:
>    processor :  0       1       2       3
>    phys-CPU     0       1       0       1
>   or:
>    processor :  0       1       2       3
>    phys-CPU     0       0       1       1
> 
You should be able to determine socket number for the cpu in
/sys/devices/system/cpu/cpu*/topology/ ... 

> 
> regards Heiko
> 
[snip log files]

--
Mats




_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>