[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally



On 02/23/17 00:42 -0700, Jan Beulich wrote:
> >>> On 23.02.17 at 04:06, <haozhong.zhang@xxxxxxxxx> wrote:
> > On 02/22/17 06:53 -0700, Jan Beulich wrote:
> >> >>> On 17.02.17 at 07:39, <haozhong.zhang@xxxxxxxxx> wrote:
> >> > @@ -1709,6 +1724,7 @@ static void mce_softirq(void)
> >> >  {
> >> >      int cpu = smp_processor_id();
> >> >      unsigned int workcpu;
> >> > +    bool lmce = per_cpu(lmce_in_process, cpu);
> >> 
> >> Is this flag valid to be looked at anymore at this point in time? MCIP
> >> was cleared a lot earlier, so there may well have been a 2nd #MC
> >> in between. In any event you again don#t need the local variable
> >> here afaict.
> > 
> > A non-LMCE MC# coming in between does not cause problem. As
> > lmce_in_process on all CPUs are cleared, mce_softirq() on all CPUs
> > will sync with each other as before and finally one of them will
> > handle the pending LMCE.
> > 
> > I think the problem is one flag is not enough rather than non
> > needed. One lmce_in_process flag misses the following case:
> > 1) mcheck_cmn_handler() first handles a non-LMCE MC on CPU#n and raises
> >    MACHINE_CHECK_SOFTIRQ.
> > 2) Before mce_softirq() gets chance to run on CPU#n, another LMCE
> >    comes to CPU#n. Then mcheck_cmn_handler() sets lmce_in_process on
> >    CPU#n and raises MACHINE_CHECK_SOFTIRQ again.
> > 3) mce_softirq() finally gets change to run on CPU#n. It sees
> >    lmce_in_process is set and consequently handles pending MCEs on
> >    CPU#n w/o waiting for other CPUs. However, one of the pending MCEs
> >    is not LMCE.
> > 
> > So I'm considering to introduce another local flag "mce_in_process" to
> > indicate whether there is a non-LMCE MC is pending for softirq.
> > 1) When a non-LMCE MC# comes to CPU#n, mcheck_cmn_handler() sets
> >    mce_in_process flag on CPU#n.
> > 2) When a LMCE comes to CPU#n, mcheck_cmn_handler() sets
> >    lmce_in_process flag on CPU#n.
> > 3) When mce_softirq() starts, it clears lmce_in_process flag if
> >    mce_in_process is set, so it will not handle non-LMCE MC w/o
> >    waiting for other CPUs.
> > 4) mce_softirq() clears both flags after exiting all MC barriers.
> 
> I'm afraid I still don't see how a static flag can deal with an
> arbitrary number of #MC-s arriving prior to the softirq handler
> getting a chance to run after the very first one came in.
>

mce_barrier_enter/exit need to be adapted to check both flags to
decide whether it needs to wait for others.
    void mce_barrier_enter/exit(struct mce_softirq_barrier *bar)
    {
        int gen;

    -   if (!mce_broadcast)
    +   if ( !mce_broadcast ||
    +        (!this_cpu(mce_in_process) && this_cpu(lmce_in_process)) )
            return;


When mce softirq handler finally starts to run, regardless how many
MCs have happened before,

1) if it sees only lmce_in_process flag is set, all pending MCs in
   this_cpu(mctctl.pending) are LMCE and the handler does not need to
   wait for others and mce_barrier_enter/exit allows it to do so.

2) if it sees mce_in_process flag is set, all or some pending MCs in
   this_cpu(mctctl.pending) are non-LMCE, i.e. other CPUs have
   received them as well, so mce softirq handler should wait for
   others and mce_barrier_enter/exit does wait for others.


Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.