[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 09/10] microcode: remove microcode_update_lock



>>> On 13.06.19 at 19:47, <ashok.raj@xxxxxxxxx> wrote:
> On Thu, Jun 13, 2019 at 08:08:46AM -0600, Jan Beulich wrote:
>> >>> On 13.06.19 at 16:05, <chao.gao@xxxxxxxxx> wrote:
>> > On Wed, Jun 12, 2019 at 01:38:31AM -0600, Jan Beulich wrote:
>> >>>>> On 11.06.19 at 18:04, <ashok.raj@xxxxxxxxx> wrote:
>> >>> On Tue, Jun 11, 2019 at 08:46:04PM +0800, Chao Gao wrote:
>> >>>> On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote:
>> >>>> >
>> >>>> >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct 
>> >>>> >> microcode_patch 
> 
>> >>> *patch)
>> >>>> >>  
>> >>>> >>      mc_intel = patch->mc_intel;
>> >>>> >>  
>> >>>> >> -    /* serialize access to the physical write to MSR 0x79 */
>> >>>> >> -    spin_lock_irqsave(&microcode_update_lock, flags);
>> >>>> >> +    BUG_ON(local_irq_is_enabled());
>> >>>> >>  
>> >>>> >>      /*
>> >>>> >>       * Writeback and invalidate caches before updating microcode to 
>> >>>> >> avoid
>> >>>> >
>> >>>> >Thinking about it - what happens if we hit an NMI or #MC here?
>> >>>> >watchdog_disable(), a call to which you add in an earlier patch,
>> >>>> >doesn't really suppress the generation of NMIs, it only tells the
>> >>>> >handler not to look at the accumulated statistics.
>> >>>> 
>> >>>> I think they should be suppressed. Ashok, could you confirm it?
>> >>> 
>> >>> I think the only sources would be the watchdog as you pointed out
>> >>> which we already touch to keep it from expiring. The perf counters
>> >>> i'm not an expert in, but i'll check. When we are in stop_machine() type
>> >>> flow, its not clear if any of those would fire. (I might be wrong, but 
>> >>> let
>> >>> me check).
>> >>
>> >>Well, without disarming the watchdog NMI at the LAPIC / IO-APIC,
>> >>how would it _not_ potentially fire?
>> > 
>> > We plan not to prevent NMI being fired. Instead, if one thread of a core
>> > is updating microcode, other threads of this core would stop in the
>> > handler of NMI until the update completion. Is this approach acceptable?
>> 
>> Well, I have to return the question: It is you who knows what is or
>> is not acceptable while an ucode update is in progress. In particular
>> it obviously matters how much ucode is involved in the delivery of
>> an NMI (and in allowing the handler to get to the point where you'd
>> "stop" it).
>> 
>> If the approach you suggest is fine for the NMI case, I'd then wonder
>> if it couldn't also be used for the #MC one.
> 
> Architecturally only one #MC can be active in the system. If a new #MC 
> condition happens when MCG_STATUS.MCIP is already set, that would cause 
> spontaneous 
> shutdown.

That's understood.

> If another NMI arrives on the CPU doing the wrmsr, it will be pended
> in the lapic and delivered after the wrmsr returns. wrmsr flow
> can't be interrupted. 

Of course.

Neither part of your response is an argument against adding the same
"defense" to the #MC handler, though. While likely #MC will be fatal
to the system anyway, we should try to avoid making things worse
when we can.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.