Xen project Mailing List

Re: [Xen-devel] [PATCH] X86 MCE: Add more strict sanity check of one SRAR case

To: "jbeulich@xxxxxxxx" <jbeulich@xxxxxxxx>

From: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>

Date: Mon, 5 Dec 2011 21:09:43 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: "keir.xen@xxxxxxxxx" <keir.xen@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>

Delivery-date: Mon, 05 Dec 2011 13:11:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcyxCrgiOHzcBq2lS5+rjRWdHHGmkQCQvVqA

Thread-topic: [Xen-devel] [PATCH] X86 MCE: Add more strict sanity check of one SRAR case

Jan, Attached is the email we discuss last month, and 2 mce patches: srar-1.patch is the patch we discussed before, srar-guest-mode.patch is an additional patch w/ more strict sanity checkof RIPV = EIPV = 0. Any comments please let me know. Thanks, Jinsong ________________________________ From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Liu, Jinsong Sent: Friday, December 02, 2011 11:55 PM To: jbeulich@xxxxxxxx Cc: keir.xen@xxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx; Jiang, Yunhong Subject: [Xen-devel] [PATCH] X86 MCE: Add more strict sanity check of one SRAR case Jan, I cannot access my remote office desktop today (network maintain), so cannot reply the SRAR patch email we discussed last month. At the last email we basically agree that the SRAR patch itself looks fine, only with 1 concern about 'guest_mode' (in fact the SRAR patch itself didn't involved guest_mode), and the SRAR patch not in xen unstable tree yet. Currently I still don't find out an alternative way to solve the guest_mode issue, so in order to be safe I add this patch. I think although it may be some overkilled, it's OK to keep it safe here, after all the SRAR case which RIPV = EIPV = 0 occur rarely. Thanks, Jinsong ====================================== X86 MCE: Add more strict sanity check of one SRAR case When RIPV = EIPV = 0, it's a little bit tricky. It may be an asynchronic error, currently we have no way to precisely locate whether the error occur at guest or hypervisor. To avoid handling error in wrong way, we treat it as unrecovered. Signed-off-by: Liu, Jinsong <jinsong.liu@xxxxxxxxx> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> # HG changeset patch # Parent e758b1d928e3d663602741ddc7f55461e2187829 diff -r e758b1d928e3 xen/arch/x86/cpu/mcheck/mce_intel.c --- a/xen/arch/x86/cpu/mcheck/mce_intel.c Sun Oct 30 17:50:53 2011 -0800 +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c Fri Dec 02 15:16:52 2011 -0800 @@ -367,8 +367,18 @@ return 0; gstatus = mca_rdmsr(MSR_IA32_MCG_STATUS); - /* Xen is not pre-emptible */ - if ( !(gstatus & MCG_STATUS_RIPV) && !guest_mode(regs)) + + /* + * FIXME: When RIPV = EIPV = 0, it's a little bit tricky. It may be an + * asynchronic error, currently we have no way to precisely locate + * whether the error occur at guest or hypervisor. + * To avoid handling error in wrong way, we treat it as unrecovered. + * + * Another unrecovered case is RIPV = 0 while in hypervisor + * since Xen is not pre-emptible. + */ + if ( !(gstatus & MCG_STATUS_RIPV) && + (!(gstatus & MCG_STATUS_EIPV) || !guest_mode(regs)) ) return -1; return mce_action(regs, mctc) == MCER_RESET ? -1 : 0;

--- Begin Message ---

To: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
From: Jan Beulich <JBeulich@xxxxxxxx>
Date: Wed, 2 Nov 2011 17:57:35 +0800
Cc: "tim.deegan@xxxxxxxxxx" <tim.deegan@xxxxxxxxxx>, "keir.xen@xxxxxxxxx" <keir.xen@xxxxxxxxx>, "Shan, Haitao" <haitao.shan@xxxxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Thread-index: AcyZRd2aeVDAGxj0SBmEkzjrHvmBIw==
Thread-topic: [PATCH] X86 MCE: Add SRAR handler

>>> On 21.10.11 at 22:25, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> I update a little for my former patch, as attached.
> For my former patch, you mainly have 2 concerns (list below). I double check 
> xen mce code, w/ my opinion append:

Those concerns were really just triggered by the patch, not directly
related to it. The patch itself looks okay to me.

> Concern 1: for SRAR IFU error, since RIPV=EIPV=0, it maybe an async error 
> which occur at guest but root from hypervisor.
> [Jinsong]: 
>     Yes, but EIPV didn't tell us where the error root from (it's just a 
> hint, warning us async possibility). 
>     It no need to overkill xen at mce isr, instead, at mce softirq we can 
> find out error root location and then handle accordingly:
>     * at mce isr:
>             /* a total insurance */
>             /* if error is async, we delay handle it at mce softirq */
>             if ( !(gstatus & MCG_STATUS_RIPV) && !guest_mode(regs))
>                 return -1;

I continue to think that guest_mode() must not be used without
EIPV, no matter what the purpose of the use.

>     * at mce softirq:
>             /* detect error location by bank->mc_addr */
>             /* handle different page OWNER cases at intel_memerr_dhandler() 
> and offline_page() */
>             /* who own, who take */
>             if (error page owner is guest)
>                 trigger vmce to guest;
>             else
>                 panic xen;

That part is certainly fine.

> Concern 2: If a guest accesses the hypervisor part of the GDT or page 
> tables, or some other shared data structure owned by the hypervisor (like the 
> M2P table), its handler may get utterly confused by being presented an 
> address it doesn't own and knows nothing about.
> [Jinsong]: for such cases, page owner would be dom_xen/dom_cow or NULL, but 
> not guest --> it would be handled at hypervisor, not triggering vmce to guest 
> --> 
> who own, who take.

That latter part of your explanation is fine too, but with the caveat that
the bogus use of guest_mode() above may have an overall effect on the
behavior.

Jan

--- End Message ---

Attachment: srar-1.patch
Description: srar-1.patch

Attachment: srar-guest-mode.patch
Description: srar-guest-mode.patch

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.