[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: MCA/MCE concept



Hi,

On 05/30/07 10:10, Christoph Egger wrote:
On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote:
"Christoph Egger" <Christoph.Egger@xxxxxxx> 30.05.07 09:45 >>>
On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
case I) - Xen reveives a MCE from the CPU

1) Xen MCE handler figures out if error is an correctable error (CE)
   or uncorrectable error (UE)
2a) error == CE:
    Xen notifies Dom0 if Dom0 installed an MCA event handler
    for statistical purpose
[rest cut]

For the hypervisor to dom0 communication that 2a) above refers to I think
we need to agree on two aspects:  what form the notification event will
take, and what error telemetry data and additional information will
be provided by the hypervisor for dom0 to chew on for statistical
and diagnosis purposes.

For the first I've assumed so far that an event channel notification
of the MCA event will suffice;  as long as the hypervisor only polls
for correctable MCA errors at a low-frequency rate (currently 15s interval)
there is no danger of spamming that single notification.  On
receipt of the notification the event handler will need to suck
some event data out of somewhere - uncertain which somewhere would
be best?

We should standardize both the format and the content of this event
data.  The following is just to get the conversation started in this
area.

Content first.  Obviously we need the raw MCA register content -
MCi_STATUS, MCi_ADDR, MCi_MISC.  We also need know which
MCA detector bank made the observation, so we need to include
some indication of which chip (where I use "chip" to coincide
with "socket"), core on that chip, and MCA bank number
the telemetry came from.  I think I am correct in saying that
hyperthreaded CPUs do not have any MCA banks per-thread, but we
may want to allow for that future possibility (I know, for instance,
that some SPARC cpus have error state for each hardware thread).

Such specification of error detector information clearly requires
some namespace specification.  For example if the detector identifier
could naturally come out of Xen as a (chip, core, thread, bank)
there needs to be a clear understanding of how chips, cores etc
are numbered in Xen and how dom0 that matches with how the dom0
OS has numbered these things.  If instead the detector identifier
were something like a (physical-cpu, bank) using the Xen physical-cpu
enumeration then dom0 may need a mechanism to resolve this into
chip etc info - you can't just work with physical cpus since, for
example, a chip-shared L3 cache spans multiple physical cpus.

We should also allow for additional model-specific error telemetry
that may be available and relevant - I know that will be necessary
for some upcoming x86 cpu models.  We should probably avoid adding
"cooked" content to this error event payload - such cooking of the
raw data is much more easily performed in dom0 (the example I'm
thinking of here is physical address to memory location translation).

In terms of the form of the error event data, the simplest but also
the dumbest would be a binary structure passed from hypervisor
to dom0:

struct mca_error_data_ver1 {
        uint8_t version;        /* structure version */
        uint64_t status;
        uint64_t addr;
        uint64_t misc;
        uint16_t chip;
        uint16_t core;
        uint16_t bank;
        ...
};

That is easily passed around and can be extended by versioning.
A more self-describing and naturally extensible approach would be
to parcel the error data in some form of name-type-value list.
That's what we do in the corresponding kernel->userland error
code in Solaris; the downside is that the supporting libnvpair
library is not tiny and likely not the sort of footprint to
include in a hypervisor.  Perhaps some cut-down form would do.

Thoughts?

Gavin


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.