WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [patch 4/4]Enable CMCI (Corrected Machine Check Error In

To: Christoph Egger <Christoph.Egger@xxxxxxx>
Subject: RE: [Xen-devel] [patch 4/4]Enable CMCI (Corrected Machine Check Error Interrupt) for Intel CPUs
From: "Ke, Liping" <liping.ke@xxxxxxxxx>
Date: Wed, 14 Jan 2009 09:35:25 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: Keir, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Fraser <keir.fraser@xxxxxxxxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Delivery-date: Tue, 13 Jan 2009 17:35:59 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <200901131221.23669.Christoph.Egger@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E2263E4A5B2284449EEBD0AAB751098401C4C32E95@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <200901121503.17232.Christoph.Egger@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C4F02B3C@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <200901131221.23669.Christoph.Egger@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acl1cTVMt5KcnaLASNGhAFqNp/7HGQAdMgTQ
Thread-topic: [Xen-devel] [patch 4/4]Enable CMCI (Corrected Machine Check Error Interrupt) for Intel CPUs
Christoph Egger wrote:
> On Tuesday 13 January 2009 03:12:51 Ke, Liping wrote:
>> Hi, Christoph
>> Please see our comments below
>> 
>> Christoph Egger wrote:
>>> On Friday 19 December 2008 07:19:16 Ke, Liping wrote:
>>>> Hi, all
>>>> 
>>>> Patch 4 is the main patch for CMCI enabling in XEN. It adds the
>>>> CMCI interrupt handler, new common CMCI/MCA init process,CMCI
>>>> owner judge algorithm when bring_up CPUs, CPU on/offline  and 
>>>> polling mechanisms, etc 
>>>> 
>>>> Thanks & Regards,
>>>> Criping
>>> 
>>> This patch changes the public API. There's no need to change
>>> struct mcinfo_extended. The marshalling technique allows to use
>>> this structure as often as needed. Please undo this change.
>> 
>> Since Intel extended msrs' number is bigger than AMD, and we can't
>> use pointer and allocate memory in mca handler, so we extended it
>> from 5 -> 10. We think it should not impact any of your usage.
>> 
>> Else, we can change boundary 5->0, use a globally allocated pointer
>> instead. But it seems not that necessary. How do you think about?
> 
> When I defined the API, I already knew that 5 is not enough for Intel.
> So I made struct mc_info large enough to keep multiple entities in any
> combination. I expected from you to fill struct mcinfo_extended two or
> three times into struct mc_info  the same way as you do with
> struct mcinfo_bank.   There's no (additional) allocation needed.
> 
> From your description I just read, that you don't know this
> marshalling technique.

Hi, Christoph

We do understand and actually are using your *marshalling* techniques,
you see we push each bank infor to one entry of your *mcinfo_bank*. 
It's flexible and just good. 
But split one integral information into several entries seems not necessary.
For me it is not good programming style for me. If we have several kinds 
of extended Infor, surely I will use your marshalling techniques. 
It should not *impact* public interface(We noticed you have calculate 
data structure size dynamically we think). How do you think? 

For the reason of keeping apic_id, partly because we're not sure whether 
the information would be useful to recovery action(It brings more
Information than core_id), so we just want to *keep* it. 
If you don't use it, just let it be blank. It should not impact your interface. 
Your opinion? 

Thanks a lot for your help!
Regards,
Criping

> 
> 
>>> struct mcinfo_global is also changed. Why can't mc_coreid not be
>>> filled with the apicid ? Adding the apicid field breaks the
>>> alignment causing troubles on 32bit-guest-on-64bit-hypervisor.
>> 
>> We plan to extend the apic_id to 32 bit since 8 bit is not enough
>> for new apic_id. Besides, for this problem, before sending the
>> patch, we actually talked about it. Seems original structure is not
>> 32/64 alligned. How about below changes? 
>> 
>> struct mcinfo_global {
>>     struct mcinfo_common common; 4 byte
>> 
>>     /* running domain at the time in error (most likely the impacted
>>     one) */ uint16_t mc_domid; 2 byte uint32_t mc_socketid; /*
>>     physical socket of the physical core */ 4 byte uint16_t
>>     mc_coreid; /* physical impacted core */ 2 byte uint8_t 
>>     mc_apicid; ---change it to 4 byte uint16_t mc_core_threadid; /*
>>     core thread of physical core */ 2 byte uint16_t mc_vcpuid; /*
>>     virtual cpu scheduled for mc_domid */ 2 byte uint64_t
>> mc_gstatus; /* global status */ 8 byte     uint32_t mc_flags; 4 byte
>> }; 
>> 
>> 
>> Change to
>> ------------------------>>>>>
>> 
>> struct mcinfo_global {
>>     struct mcinfo_common common; 4 byte
>> 
>>     /* running domain at the time in error (most likely the impacted
>> one) */ uint32_t mc_socketid; /* physical socket of the physical
>> core */ 4 byte
>>    
>>    
>> -----------------------------------------------------------------   
>>     uint16_t mc_domid; 2 byte uint16_t mc_coreid; /* physical
>>     impacted core */ 2 byte uint32_t  mc_apicid; ---change it to 4
>> byte
>>    
>> -------------------------------------------------------------------
>> uint16_t mc_core_threadid; /* core thread of physical core */ 2 byte
>> uint16_t mc_vcpuid; /* virtual cpu scheduled for mc_domid */ 2 byte 
>> uint32_t mc_flags; 4 byte
>> --------------------------------------------------------------------------
>> uint64_t mc_gstatus; /* global status */ 8 byte };     
> 
> Your proposed change fixes the alignment problem, but it doesn't
> answer my question: Why is mc_apicid needed at all since the CPU core
> is already identified by mc_coreid ?
> 
> Christoph


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel