RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Aha, Christoph, sorry for the suprise to you, but I think we have descript our 
suggestion to you (refert to http://markmail.org/message/vpcdojylxkrg6uz3 
please). As I didn't get any response from your side, so I suppose you are 
waiting for the patch to get more idea, that's the reason Criping and I hurry 
up to cook the patch and send it out as RFC. The RFC means it is target for 
comments, as we know MCA handling is complex and need community discussion (I 
have to say sometimes patch is more clear than design doc, although cooking a 
patch need more effort).

Your description of our design is quite clear, that also means our RFC has 
achieved it's purpose :-) One exception is item 6, MCE trap handler in HV side 
is still needed for PV domain just as it is now (the bounce buffer, the trap 
priority etc), but for guest, yes, we try to re-use guest's MCA handler. 

As said already, MCE handling is complex, so can we discuss it in details on 
how to handle the MCA and get some consensus ? We have CC'ed all engineers we 
think may be interesting on it.

I merge comments to your another mail as below:

>- The MCE routines in Xen are only for error data *collection*.
>  Just pass it to Dom0 and that's it.
>  Dom0 will do the error analysis and figure out what do to.
>  It is the Dom0 which will do a hypercall to do things like
>  page-offlining or cpu offlining or whatever is needed.
>  Your code tries to move everyting back from Dom0 into the
>  hypervisor. I remember Keir having rejected my MCE patches
>  because he feared this bloat.

Sorry that I didn't notice Keir's feedback to your original patch, I will 
google it, or it will be great if you can share me when that happen?

>- MCA flags: what are the differences between correctable 
>  and recoverable ? what are the differences between uncorrectable,
>  polled, reset and cmci and mce types ?

Per my understanding, correctable error (sometimes it is called corrected 
error) means hardware have recovered the error and software is not impacted 
(although some proactive action is prefered), while recoverable means hardware 
does not recover the error but it is possible that softeare can recover the 
error (it is sometihng like non-fatal error in PCI-E spec, although not exactly 
same, I think).

>
>- You use dynamic memory allocation (which uses spinlocks) in MCE code
>  and you roll your own mce handling instead of using the 
>generic API in mce.c

I think that is in softIRQ context and should be ok for spinlocks.

>  I suppose, you don't understand it at all.
>
>- I attach the design document again, since I have the 
>impression, noone
>  at Intel read it, hence the misunderstandings.

I promise we read it carefully, otherwise my manager is sure to challenge me 
firstly before you, and it is really good written.

>
>I think, it is best to get Gavin's generic mce improvements 
>upstream first.
Sure, Gavin's improvement is important. Again, this patch is just a RFC, and 
some components is still WIP like inject per-domain MCA since we want to get 
input firstly.

Thanks
Yunhong Jiang

>-----Original Message-----
>From: Christoph Egger [mailto:Christoph.Egger@xxxxxxx] 
>Sent: 2009年2月16日 22:18
>To: xen-devel@xxxxxxxxxxxxxxxxxxx
>Cc: Ke, Liping; Frank.Vanderlinden@xxxxxxx; Jiang, Yunhong; 
>Keir Fraser; Gavin Maltby
>Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>
>
>I realize from this and earlier MCE patches from Intel,
>that Intel tries to change the machine check design
>on its ground.
>
>The basic ideas behind current design:
>
>1. Xen collects error telemetry
>2. Xen delivers correctable errors to Dom0 via VIRQ
>3. Xen delivers uncorrectable errors to Dom0 via trap handler
>4. Xen delivers uncorrectable errors to DomU only if Dom0 
>tells Xen to do so
>5. Xen performs health measurements as told by Dom0 via hypercalls
>   such as cpu- or page-offlining
>6. Dom0 performs error analysis, figures out what is going on,
>    calls hypercalls for the right health measurement
>
>
>The basic ideas behind Intel's new design (as far as I can see 
>them from their 
>patches I have seen so far):
>
>1. Xen collects error telemetry
>2. Xen performs error analysis, figures out what is going on
>3. Xen automatically does health measurements automatically
>    like cpu- and page-offlining
>4. Xen delivers error telemetry to Dom0 via VIRQ for error logging only
>    independent of the error type
>5. Inject MCEs into the guest directly
>6. Don't use the MCE trap handler at all
>
>
>IMO, any design change should be discussed first and not changed
>silently, since this will confuse everyone and noone will know 
>what is the right thing to do in Xen and in Dom0 and this
>in turn will lead to error prone, unmaintainable code in both
>Xen and in Dom0
>
>Christoph
>
>
>On Monday 16 February 2009 14:34:36 Christoph Egger wrote:
>> To me, it seems, the design has not been understood
>> and now, the code becomes more and more unmaintainable
>> bloat. I mean, the code is going to do far too much.
>>
>> - The MCE routines in Xen are only for error data *collection*.
>>   Just pass it to Dom0 and that's it.
>>   Dom0 will do the error analysis and figure out what do to.
>>   It is the Dom0 which will do a hypercall to do things like
>>   page-offlining or cpu offlining or whatever is needed.
>>   Your code tries to move everyting back from Dom0 into the
>>   hypervisor. I remember Keir having rejected my MCE patches
>>   because he feared this bloat.
>>
>> - Dom0 VIRQ is for correctable errors only. Uncorrectable errors
>>   are delivered via MCE trap. Dom0 and DomU register a handle
>>   via set_trap_table hypercall. A non-registrated handler means,
>>   the guest can't handle it by itself. Dom0 is always notified,
>>   the guest becomes only notified
>>   This seperation is completely ignored and misuse Dom0 VIRQ 
>for everything
>>   (therefore the bunch of superflous flags (see next point))
>>
>> - MCA flags: what are the differences between correctable
>>   and recoverable ? what are the differences between uncorrectable,
>>   polled, reset and cmci and mce types ?
>>
>> - You use dynamic memory allocation (which uses spinlocks) 
>in MCE code
>>   and you roll your own mce handling instead of using the 
>generic API in
>> mce.c I suppose, you don't understand it at all.
>>
>> - I attach the design document again, since I have the 
>impression, noone
>>   at Intel read it, hence the misunderstandings.
>>
>> I think, it is best to get Gavin's generic mce improvements 
>upstream first.
>>
>> On Monday 16 February 2009 06:35:14 Ke, Liping wrote:
>> > Hi, all
>> > These patches are for MCA enabling in XEN. It is sent as 
>RFC firstly to
>> > collect some feedbacks for refinement if needed before the 
>final patch.
>> > We also attach one description txt documents for your reference.
>> >
>> > Some implementation notes:
>> > 1) When error happens, if the error is fatal (pcc = 1) or can't be
>> > recovered (pcc = 0, yet no good recovery methods), for 
>avoiding losing
>> > logs in DOM0, we will reset machine immediately. Most of 
>MCA MSRs are
>> > sticky. After reboot, MCA polling mechanism will send vIRQ 
>to DOM0 for
>> > logging. 2) When MCE# happens, all CPUs enter MCA context. 
>The first CPU
>> > who read&clear the error MSR bank will be this MCE# owner. 
>Necessary
>> > locks/synchronization will help to judge the owner and 
>select most severe
>> > error. 3) For convenience, we will select the most 
>offending CPU to do
>> > most of processing&recovery job. 4) MCE# happens, we will 
>do three jobs:
>> > a. Send vIRQ to DOM0 for logging
>> >     b. Send vMCE# to Impacted Guest (Currently Only inject 
>to impacted
>> > DOM0) c. Guest vMCE MSR virtualization
>> > 5) Some further improvement/adds might be done if needed:
>> >     a) Impacted DOM judgement algorithm.
>> >     b) Now vMCE# injection is controlled by centralized 
>data(vmce_data).
>> > The injection algorithm is a bit complex. We might change 
>the algorithm
>> > which's based on PER_DOM data if you preferred. Notes for 
>understanding:
>> >         1) If several banks impact one domain, yet those 
>banks belong to
>> > the same pCPU, it will be injected only once. 2) If more 
>than one bank
>> > impact one domain, yet error banks belong to different 
>pCPU, ith will be
>> > injected nr_num(pCPU) times. 3) We use centralized data [two arrays
>> > impact_domid, impact_cpus map in vmce_data] to represent 
>the injection
>> > algorithm. Combined the two array item (idx, impact_domid) 
>and (idx,
>> > impact_cpus) into one item (idx, impact_domid, 
>impact_cpus). This item
>> > records the impact_domain id and the error pCPU map 
>(Finding UC errors on
>> > this CPU which impact this domain). Then, we can judge how 
>to inject the
>> > vMCE (domid, impact_times[nr_pCPUs]).
>> >         4) Although data structure is ready, we only 
>inject vMCE# to
>> > DOMD0 currently. c) Connection with recovery actions (cpu/memory
>> > online/offline) d) More refines and tests for HVM might be 
>done when
>> > needed.
>> >
>> > Patch Description:
>> > 1. basic_mca_support: Enable MCA support in XEN.
>> > 2. vmsr_virtualization: Guest MCE# MSR read/write 
>virtualization support
>> > in XEN. 3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE#
>> > handler. Translate XEN log to DOM0, re-use Linux kernel and MCELOG
>> > mechanisms and MCE handler. This is mainly a demonstration patch.
>> >
>> > About Test:
>> > We did some internal test and the result is just fine.
>> >
>> > Any feedback is welcome and thanks a lot for your help! :-)
>> > Regards,
>> > Criping
>
>
>
>-- 
>---to satisfy European Law for business letters:
>Advanced Micro Devices GmbH
>Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
>Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
>Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
>Registergericht Muenchen, HRB Nr. 43632
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN