WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
From: Frank van der Linden <Frank.Vanderlinden@xxxxxxx>
Date: Tue, 24 Feb 2009 11:53:51 -0700
Cc: Christoph Egger <Christoph.Egger@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Ke, Liping" <liping.ke@xxxxxxxxx>, Gavin Maltby <Gavin.Maltby@xxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Kleen, Andi" <andi.kleen@xxxxxxxxx>
Delivery-date: Tue, 24 Feb 2009 10:54:26 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <E2263E4A5B2284449EEBD0AAB751098401C7B24F23@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C5BF30B3.2C2B%keir.fraser@xxxxxxxxxxxxx> <200902181905.55015.Christoph.Egger@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7AAC7A0@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <200902191725.32556.Christoph.Egger@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7AACC2B@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <499F1A1A.2080808@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7B24F23@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.17 (X11/20081023)
Thanks for your reply. Let me explain my comments a little:

Jiang, Yunhong wrote:

One notice is, we delieve vMCE to dom0/domU only when it is impacted. The idea 
behind this is, MCE is handled by Xen HV totally, while guest's vMCE handler 
will only works for itself. For example, when a page broken, Xen will firstly 
mark the page offline in Xen side (i.e. take the recover action), then, it will 
inject a vMCE to guest corresponding (dom0 or domU), the guest will kill the 
application using the page, free the page, or do more action.

And we always pass the vIRQ to dom0 for logging and telemetry, user space tools 
can take more proactive action for this if needed.

I understand this part, and have no problems with them mechanism itself. I think it has advantages over the original concept, where dom0 informs domUs. My question is: what useful action can a domU take without fully knowing the physical system? I'll go more in to that below.

What would be needed for the Solaris framework, however, is to provide
information on what action was taken, along with the telemetry. As

Agree that this modification is needed. Sorry we didn't reliaze the requirement 
from Dom0 after reboot.

Either we can pass the action in the telemetry, or Dom0 can take action 
specific method ,like retrieve the offlined page from Xen before reboot. If we 
take the former, we may need a interface definition.

Passing the action along with the telemetry seems the best way to go to me. Since the telemetry is used to determine which action to take, any information on actions already taken should come at the same time.

\

What do you mean of the effect of wrmsr instruction. We need considering inject #GP if invalid wrmsr , or remove the event when guest clear the MCi_STATUS_MCA if needed. We send this RFC early to get feedback firstly for the design idea. Or you mean more than this for the wrmsr?

To take further action, the MCA code in dom0 (or a domU) needs to know
that it is running under Xen, and it needs to have detailed physical

Our purpose is guest has no idea it is running under xen as descripted above. 
And what information do you think a normal guest's MCA handler needs to know, 
and use the detailed physical information? After all, a guest cares only 
itself. Also, maybe we can't provide PV handler for all guest (like windows).

Dom0 is a special case, it's vIRQ handler knows it is running under Xen, but that is for log/telemetry and for proactive action.
information on the system. In other words, the existing code
that can be

What do you mean of "existing", our patch or current Xen implementation?

used is only the code that gathers some information. So, the
only thing
that vMCE is good for, is that you can run unmodified error logging
code. But you can't interpret any of the error information further
without knowing more. Especially for a domU, which might not know
anything, this doesn't seem useful. What would the user of a domU do with
that information? To recap, I think the part where Xen itself takes action is fine, with
some modifications. But I don't see any advantages in vMCE delivery,
unless I'm missing something of course..

I think the main advantage are:
a) We don't need maintain a PV MCA handler for guest, especially for HVM guest
b) We can get benifit from guest's MCA improvement/enhancement .
c) Applying this to dom0, we don't need different mechanism to dom0/hvm.

Ok, my main issue here is: if you want to enable a guest to run unmodified MCA code (which you state as a goal, and as an advantage of the vMCE approach), then what can the guest actually do. Or the dom0, for that matter?

MCA information is highly specific to the hardware. Without additional information on the hardware, it is hard, or even impossible, for the unmodified MCA handler in dom0 or a domU to do anything useful. It will interpret the information to fit the virtualized environment it is in, which doesn't match the reality of the hardware at all. So what can it do? It can just read the MSRs and log the information, but even that information wouldn't be useful; it is already available to dom0, where the code and/or person who can make sense of the data will see it. The unmodified MCA handler also can't take any corrective action; it might think that it is taking action, but in fact, its wrmsr instructions have no effect (and they shouldn't, guests should definitely not be able to do MSR writes).

I only see one possible exception to this: if you translate the ADDR MSR of a bank to a guest address in the vmca info before delivering the vMCE, then the guest could do something useful, because its virtualized MSR reads would then produce a guest address, and it could do something useful with it. But currently, your code doesn't seem to do this; the virtualized MSR will produce the machine address, which the guest can't do anything with, unless it knows its running under Xen.

So that's my main problem here: there is a contradiction. The vMCE mechanism as you implement it enables guests to run an unmodified MCA handler, but there isn't actually much that the guest can do with that, without knowing it runs under Xen. I see only one specific use for this: if you translate the ADDR info to a guest address, it could potentially try to do a "local" page retire.

- Frank

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>