[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Haozhong Zhang <haozhong.zhang@xxxxxxxxx>
Date: Thu, 23 Feb 2017 11:06:05 +0800
Cc: Liu Jinsong <jinsong.liu@xxxxxxxxxxxxxxx>, Christoph Egger <chegger@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Delivery-date: Thu, 23 Feb 2017 03:06:24 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>
Mail-followup-to: Jan Beulich <JBeulich@xxxxxxxx>, Liu Jinsong <jinsong.liu@xxxxxxxxxxxxxxx>, Christoph Egger <chegger@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

On 02/22/17 06:53 -0700, Jan Beulich wrote:
> >>> On 17.02.17 at 07:39, <haozhong.zhang@xxxxxxxxx> wrote:
> > --- a/xen/arch/x86/cpu/mcheck/barrier.c
> > +++ b/xen/arch/x86/cpu/mcheck/barrier.c
> > @@ -20,7 +20,7 @@ void mce_barrier_enter(struct mce_softirq_barrier *bar)
> >  {
> >      int gen;
> >  
> > -    if (!mce_broadcast)
> > +    if ( !mce_broadcast || __get_cpu_var(lmce_in_process) )
> 
> this_cpu() please instead of __get_cpu_var() (which we should get
> rid of rather sooner than later).

will use this_cpu()

>
> > @@ -462,6 +474,7 @@ void mcheck_cmn_handler(const struct cpu_user_regs 
> > *regs)
> >      uint64_t gstatus;
> >      mctelem_cookie_t mctc = NULL;
> >      struct mca_summary bs;
> > +    bool *lmce_in_process = &__get_cpu_var(lmce_in_process);
> >  
> >      mce_spin_lock(&mce_logout_lock);
> >  
> > @@ -505,6 +518,8 @@ void mcheck_cmn_handler(const struct cpu_user_regs 
> > *regs)
> >      }
> >      mce_spin_unlock(&mce_logout_lock);
> >  
> > +    *lmce_in_process = bs.lmce;
> 
> You don't need a new local variable for this.
> 
> > @@ -1709,6 +1724,7 @@ static void mce_softirq(void)
> >  {
> >      int cpu = smp_processor_id();
> >      unsigned int workcpu;
> > +    bool lmce = per_cpu(lmce_in_process, cpu);
> 
> Is this flag valid to be looked at anymore at this point in time? MCIP
> was cleared a lot earlier, so there may well have been a 2nd #MC
> in between. In any event you again don#t need the local variable
> here afaict.

A non-LMCE MC# coming in between does not cause problem. As
lmce_in_process on all CPUs are cleared, mce_softirq() on all CPUs
will sync with each other as before and finally one of them will
handle the pending LMCE.

I think the problem is one flag is not enough rather than non
needed. One lmce_in_process flag misses the following case:
1) mcheck_cmn_handler() first handles a non-LMCE MC on CPU#n and raises
   MACHINE_CHECK_SOFTIRQ.
2) Before mce_softirq() gets chance to run on CPU#n, another LMCE
   comes to CPU#n. Then mcheck_cmn_handler() sets lmce_in_process on
   CPU#n and raises MACHINE_CHECK_SOFTIRQ again.
3) mce_softirq() finally gets change to run on CPU#n. It sees
   lmce_in_process is set and consequently handles pending MCEs on
   CPU#n w/o waiting for other CPUs. However, one of the pending MCEs
   is not LMCE.

So I'm considering to introduce another local flag "mce_in_process" to
indicate whether there is a non-LMCE MC is pending for softirq.
1) When a non-LMCE MC# comes to CPU#n, mcheck_cmn_handler() sets
   mce_in_process flag on CPU#n.
2) When a LMCE comes to CPU#n, mcheck_cmn_handler() sets
   lmce_in_process flag on CPU#n.
3) When mce_softirq() starts, it clears lmce_in_process flag if
   mce_in_process is set, so it will not handle non-LMCE MC w/o
   waiting for other CPUs.
4) mce_softirq() clears both flags after exiting all MC barriers.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally
  - From: Jan Beulich

References:
- [Xen-devel] [PATCH 00/19] MCE code cleanup and add LMCE support
  - From: Haozhong Zhang
- [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally
  - From: Haozhong Zhang
- Re: [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally
  - From: Jan Beulich

Prev by Date: [Xen-devel] [seabios baseline-only test] 68604: tolerable FAIL
Next by Date: Re: [Xen-devel] [PATCH 13/19] x86/mce_intel: detect and enable LMCE on Intel host
Previous by thread: Re: [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally
Next by thread: Re: [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.