WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] RFC: MCA/MCE concept

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] RFC: MCA/MCE concept
From: "Christoph Egger" <Christoph.Egger@xxxxxxx>
Date: Wed, 6 Jun 2007 13:57:26 +0200
Cc: Gavin Maltby <Gavin.Maltby@xxxxxxx>, Keir Fraser <keir@xxxxxxxxxxxxx>
Delivery-date: Wed, 06 Jun 2007 04:55:59 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <46668DE3.2020006@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <200705291732.46709.Christoph.Egger@xxxxxxx> <200706061128.02752.Christoph.Egger@xxxxxxx> <46668DE3.2020006@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.9.6
On Wednesday 06 June 2007 12:35:15 Gavin Maltby wrote:
> Hi,
>
> On 06/06/07 10:28, Christoph Egger wrote:
> > On Monday 04 June 2007 18:16:56 Gavin Maltby wrote:
> >> Hi,
> >>
> >> On 05/30/07 10:10, Christoph Egger wrote:
> >>> On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote:
> >>>>>>> "Christoph Egger" <Christoph.Egger@xxxxxxx> 30.05.07 09:45 >>>
> >>>>>
> >>>>> On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
> >>>>>>> case I) - Xen reveives a MCE from the CPU
> >>>>>>>
> >>>>>>> 1) Xen MCE handler figures out if error is an correctable error
> >>>>>>> (CE) or uncorrectable error (UE)
> >>>>>>> 2a) error == CE:
> >>>>>>>     Xen notifies Dom0 if Dom0 installed an MCA event handler
> >>>>>>>     for statistical purpose
> >>
> >> [rest cut]
> >>
> >> For the hypervisor to dom0 communication that 2a) above refers to I
> >> think we need to agree on two aspects:  what form the notification event
> >> will take, and what error telemetry data and additional information will
> >> be provided by the hypervisor for dom0 to chew on for statistical and
> >> diagnosis purposes.
> >
> > Additionally, the hypervisor must be able to notify domU that has
> > a PV MCA driver.
>
> Yes, forgot that; although I guess I view that most likely as a future
> phase.

Yes, but ignoring this can lead to a design that is bad for DomU and
requires a re-design in the worst case.

> >> For the first I've assumed so far that an event channel notification
> >> of the MCA event will suffice;  as long as the hypervisor only polls
> >> for correctable MCA errors at a low-frequency rate (currently 15s
> >> interval) there is no danger of spamming that single notification.
> >
> > Why polling?
>
> Polling for correctable errors, but #MC as usual for others.  Setting
> MCi_CTL bits for correctable errors does not produce a machine check,
> so polling is the only approach unless one sets additional (and
> undocumented, certainly for AMD chips) config bits.  What I was getting
> at here is that polling at largish intervals for correctables is
> the correct approach - trapping for them or polling at a high-frequency
> is bad because in cases where you have some form of solid correctable
> error (say a single bad pin in a dimm socket affecting one or two ranks
> of that dimm but never able to produce a UE) the trap handling and
> diagnosis software consume the machine and things make little useful
> forward progress.

I still don't see, why #MC for all kind of errors is bad.

> >> On receipt of the notification the event handler will need to suck
> >> some event data out of somewhere - uncertain which somewhere would
> >> be best?
> >>
> >> We should standardize both the format and the content of this event
> >> data.  The following is just to get the conversation started in this
> >> area.
> >>
> >> Content first.  Obviously we need the raw MCA register content -
> >> MCi_STATUS, MCi_ADDR, MCi_MISC.  We also need know which
> >> MCA detector bank made the observation, so we need to include
> >> some indication of which chip (where I use "chip" to coincide
> >> with "socket"), core on that chip, and MCA bank number
> >> the telemetry came from.  I think I am correct in saying that
> >> hyperthreaded CPUs do not have any MCA banks per-thread, but we
> >> may want to allow for that future possibility (I know, for instance,
> >> that some SPARC cpus have error state for each hardware thread).
> >
> > And we need the domain and the domain's vcpu to identify
> > who is impacted.
>
> Yes, the domain ID.  I'm not sure we need the vcpu id if we instead
> present some physical identifiers such as chip, core number etc
> (and have the namespaces well-defined).  If we don't present those
> the vcpu in the payload and some external method to resolve that to
> physical components.  Since errors correlate to physical components it
> would, I think, be nicer to report detector info in some physical sense.

The vcpu is more interesting for the domU than for dom0.
See below.

> As regards a vcpu to physical translation, I didn't think there was any
> fixed mapping (or certainly any mapping that a dom0 should interpret
> and rely on).  For example if we have two physical cores but choose
> to present 32 vcpus to domain I don't believe there is anything to
> say that 0-15 map always run on physical core 0?
>
> >> We should also allow for additional model-specific error telemetry
> >> that may be available and relevant - I know that will be necessary
> >> for some upcoming x86 cpu models.  We should probably avoid adding
> >> "cooked" content to this error event payload - such cooking of the
> >> raw data is much more easily performed in dom0 (the example I'm
> >> thinking of here is physical address to memory location translation).
> >>
> >> In terms of the form of the error event data, the simplest but also
> >> the dumbest would be a binary structure passed from hypervisor
> >> to dom0:
> >
> > struct mca_error_data_ver1 {
> >     uint8_t version;        /* structure version */
> >     uint64_t mc_status;
> >     uint64_t mc_addr;
> >     uint64_t mc_misc;
> >     uint16_t mc_chip;
> >     uint16_t mc_core;
> >     uint16_t mc_bank;
> >         uint16_t domid;
> >         uint16_t vcpu_id;
> >     ...
> > };
> >
> >> That is easily passed around and can be extended by versioning.
> >> A more self-describing and naturally extensible approach would be
> >> to parcel the error data in some form of name-type-value list.
> >> That's what we do in the corresponding kernel->userland error
> >> code in Solaris; the downside is that the supporting libnvpair
> >> library is not tiny and likely not the sort of footprint to
> >> include in a hypervisor.  Perhaps some cut-down form would do.
> >
> > In the public xen.h header is a VIRQ_DOM_EXC defined, which seems
> > to be appropriate for an NMI event.
> > There are two functions to send VIRQs: send_guest_vcpu_virq() and
> > send_guest_global_virq().
> >
> > However, VIRQ_DOM_EXC is not properly implemented:
> > All virtual interrupts are maskable. We definitely need
> > an event that guarantees to immediately interrupts the guest, no matter
> > if this is Dom0 or DomU and whatever they are doing.
> >
> > And VIRQ_DOM_EXC is explicitely reserved for Dom0. Maybe
> > we should introduce a VIRQ_MCA as a special NMI event for both Dom0 and
> > DomU?
>
> Sounds like it may be necessary.  I don't know this mechanism very well
> so I'll go and do some reading (after a big long unrelated codereview).

After some code reading I found a nmi_pending, nmi_masked and nmi_addr in
struct vcpu in xen/include/xen/sched.h.  xen/include/xen/nmi.h is also of 
interest. The implementation is in xen/common/kernel.c.
There is only one callback per vcpu allowed and only Dom0 can register an
NMI. So the guests NMI handler must multiplex several nmi handlers - at least
for Dom0 (MCA + watchdog timer). It's fine with me to allow DomUs to
only register the MCA NMI.

To inform domU (having a PV MCA driver), they must be able to register an
NMI callback as well. To allow this, struct vcpu_info in the PUBLIC xen.h
also needs nmi_pending and nmi_addr.


Keir: How do you feel about all this? Is this the right way or do you see
things that should be done in a different way?


Christoph


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel