Xen project Mailing List

RE: [Xen-devel] RFC: MCA/MCE concept

To: "Gavin Maltby" <Gavin.Maltby@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx

From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>

Date: Wed, 30 May 2007 17:03:55 +0200

Delivery-date: Wed, 30 May 2007 08:03:12 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AceiweeIdhki3L9NSz6dxRZ0hVG3ZQAAJZHQ

Thread-topic: [Xen-devel] RFC: MCA/MCE concept

[snip] > My feeling is that the hypervisor and dom0 own the hardware > and as such > all hardware fault management should reside there. So we should never > deliver any form of #MC to a domU, nor should a poll of MCA state from > a domU ever observe valid state (e.g, make the RDMSR return 0). > So all handling, logging and diagnosis as well as hardware > response actions > (such as to deploy an online spare chip-select) are controlled > in the hypervisor/dom0 combination. That seems a consistent > model - e.g., > if a domU is migrated to another system it should not carry the > diagnosis state of the original system across etc, since that > belongs with > the one domain that cannot migrate. I agree entirely with this. > > But that is not to say that (I think at a future phase) domU > should not > participate in a higher-level fault management function, at > the direction > of the hypervisor/dom0 combo. For example if/when we can isolate an > uncorrectable error to a single domU we could forward such an event to > the affected domU if it has registered its ability/interest in such > events. These won't be in the form of a faked #MC or anything, > instead they'd be some form of synchronous trap experienced when next > the affected domU context resumes on CPU. The intelligent > domU handler > can then decide whether the domU must panic, whether it could simply > kill the affected process etc. Those details are clearly > sketchy, but the > idea is to up-level the communication to a domU to be more like > "you're broken" rather than "here's a machine-level hardware error for > you to interpret and decide what to do with". Yes, this makes much more sense than forwarding #MC, as the guest would have a hard time to actually do anything really useful with this. As far as I know, most uncorrectable errors are near enough entirely fatal in most commercial non-Enterprise OS's anyways - e.g. in Windows XP or Server 2K3, it always ends in a blue-screen - which is hardly any better than the guest being "humanely euthenazed" by Dom0. I take it this would be some sort of hypercall (available through the regular PV-driver interface for HVM guests) to say "Let me know if I'm broken - trap on vector X". -- Mats > > Gavin > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.