[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 08/12] x86/vmce: enable injecting LMCE to guest on Intel host



On 03/20/17 10:25 -0600, Jan Beulich wrote:
> >>> On 17.03.17 at 07:46, <haozhong.zhang@xxxxxxxxx> wrote:
> > @@ -88,18 +89,31 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
> >                      goto vmce_failed;
> >                  }
> >  
> > -                if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
> > -                    global->mc_vcpuid == XEN_MC_VCPUID_INVALID)
> > +                mc_vcpuid = global->mc_vcpuid;
> > +                if (mc_vcpuid == XEN_MC_VCPUID_INVALID ||
> > +                    (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> > +                     (!(global->mc_gstatus & MCG_STATUS_LMCE) ||
> > +                      !(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled) ||
> > +                      /*
> > +                       * The following check serves for MCE injection
> > +                       * test, i.e. xen-mceinj. xen-mceinj may specify
> > +                       * the target domain (i.e. bank->mc_domid) and
> > +                       * target CPU, but it's hard for xen-mceinj to
> > +                       * ensure, when Xen prepares the actual
> > +                       * injection in this function, vCPU currently
> > +                       * running on the target CPU belongs to the
> > +                       * target domain. If such inconsistency does
> > +                       * happen, fallback to broadcast.
> > +                       */
> > +                      global->mc_domid != bank->mc_domid)))
> 
> Thinking about this another time, I don't think we want hackery
> like this for a test utility. Instead I think the test utility wants to
> pin the vCPU on the pCPU it wants to deliver the LMCE on.
> 

I agree we should not introduce hackery only for test purpose.

However, after thinking twice, I think we still need this check, but
it should be lift to the outmost, i.e.
    if (mc_vcpuid == XEN_MC_VCPUID_INVALID ||
        global->mc_domid != bank->mc_domid ||             <== here
        (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
         (!(global->mc_gstatus & MCG_STATUS_LMCE) ||
          !(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled))

MC# might not happen immediately at the moment that, e.g., the bad
memory cell is accessed, so the current domain id and vcpu id recorded
in global->mc_{domid, vcpuid} by mca_init_global() are probably not
precise (e.g. the domain accessed the bad memory was scheduled out,
and MC# comes while another domain is running). If such imprecision
does happen when handling Intel LMCE or AMD MCE, we cannot figure out
in mc_memerr_dhandler() (though it's not called in the current AMD MCE
handling, it intended to be the common code) the exact vcpu that
is affected.

To be worse, if the imprecise global->mc_vcpuid (whose value is in
variable mc_vcpuid) is larger than the maximum vcpu id of the affected
domain (indicated by variable 'd'), the check
    !(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled)
is definitely wrong.


Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.