[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: MCA/MCE concept

>case I) - Xen reveives a MCE from the CPU
>1) Xen MCE handler figures out if error is an correctable error (CE)
>    or uncorrectable error (UE)
>2a) error == CE:
>     Xen notifies Dom0 if Dom0 installed an MCA event handler
>     for statistical purpose
>2b) error == UE and UE impacts Xen or Dom0:

A very important aspect here is how you want to classify what impact an
uncorrectable has - generally, I can see very few situations where you
could confine the impact to a sub-portion of the system (i.e. a single domU,
dom0, or Xen). The general rule in my opinion must be to halt the system,
the question just is how likely it is that you can get a meaningful message
out (to screen, serial, or logs) that can help analyze the problem afterwards.
If it is somewhat likely, then dom0 should be involved, otherwise Xen should
just shut down the system.

>     Xen does some self-healing
>         and notifies Dom0 on success if Dom0 installed MCA event handler
>         or Xen panics on failure
>2c)  error == UE and UE impacts DomU:
>      In case of Dom0 installed MCA event handler:
>          Xen notifies Dom0 and Dom0 tells Xen whether
>              to also notify DomU and/or does some operations
>              on the DomU (case II)
>       In case Dom0 did not install MCA event handler,
>           Xen notifies DomU
>3a) DomU is a PV guest:
>       if DomU installed MCA event handler, it gets notified to perform
>          self-healing
>       if DomU did not install MCA event handler, notify Dom0 to do
>          some operations on DomU (case II)
>       if neither DomU nor Dom0 did not install MCA event handlers,
>          then Xen kills DomU
>3b) DomU is a HVM guest:
>       if DomU features a PV driver then behave as in 3a)

What significance do pv drivers have here? Or do you mean a pv MCA

>       if DomU enabled MCA/MCE via MSR, inject MCE into guest
>       if DomU did not enable MCA/MCE via MSR, notify Dom0
>            to do some operations on DomU (case II)
>       if neither DomU enabled MCA/MCE nor Dom0 did not install
>            MCA event handler, Xen kills DomU

Injecting an MCE to a hvm guest seems at least questionable. It can't really
do anything about it (it doesn't even know the real topology of the system
it's running on, so addresses stored in MSRs are meaningless - either you
allow the to be read untranslated [in which case the guest cannot make
sense of them] or you do translation for the guest [in which case it might
make assumptions about co-locality of other nearby pages which will be
Doing this to a pv domU for purely notification purposes (where the guest
knows it's running virtualized) is clearly a different matter.

>case II) - Xen reveives Dom0 instructions via Hypercall
>There are different reasons, why Xen should do something.
>   - Dom0 got enough CEs so that UEs are very likely to happen in order
>      to "circumvent" UEs.
>   - Possible operations on a DomU
>        - save/restore DomU
>        - (live-)migrate DomU to a different physical machine
>        - etc.

Very heavy-weight operations, which I think are unlikely to succeed if
you already suspect the system's going to suffer a UE soon.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.