[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 15/15] microcode: block #NMI handling when loading an ucode

On 30.08.2019 08:33, Chao Gao wrote:
> On Thu, Aug 29, 2019 at 02:22:47PM +0200, Jan Beulich wrote:
>> On 19.08.2019 03:25, Chao Gao wrote:
>>> @@ -481,12 +478,28 @@ static int do_microcode_update(void *patch)
>>>      return ret;
>>>  }
>>> +static int microcode_nmi_callback(const struct cpu_user_regs *regs, int 
>>> cpu)
>>> +{
>>> +    /* The first thread of a core is to load an update. Don't block it. */
>>> +    if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ||
>>> +         loading_state != LOADING_CALLIN )
>>> +        return 0;
>>> +
>>> +    cpumask_set_cpu(cpu, &cpu_callin_map);
>>> +
>>> +    while ( loading_state != LOADING_EXIT )
>>> +        cpu_relax();
>>> +
>>> +    return 0;
>>> +}
>> By returning 0 you tell do_nmi() to continue processing the NMI.
>> Since you can't tell whether a non-IPI NMI has surfaced at about
>> the same time this is generally the right thing imo, but how do
>> you prevent unknown_nmi_error() from getting entered when do_nmi()
>> ends up setting handle_unknown to true? (The question is mostly
>> rhetorical, but there's a disconnect between do_nmi() checking
>> "cpu == 0" and the control thread running on
>> cpumask_first(&cpu_online_map), i.e. you introduce a well hidden
>> dependency on CPU 0 never going offline. IOW my request is to at
>> least make this less well hidden, such that it can be noticed if
>> and when someone endeavors to remove said limitation.)
> Seems the issue is that we couldn't send IPI NMI to BSP, otherwise
> unknown_nmi_error() would be trigger. And loading ucode after
> rendezvousing all CPUs in NMI handler expects all CPUs to receive IPI
> NMI. So this approach always has such issue.

Not really, I don't think: If both sides agreed (explicitly!) on which
CPU leads this effort, then it would be clear that the one CPU
handling NMIs coming from the platform should not be sent an NMI, and
hence it should be this one to lead the effort. FAOD - my remark really
was because of the new hidden(!) dependency you introduce on CPU 0
always being this "special" CPU. I don't expect you to change the code,
but I'd like you to make the currently hidden dependency explicit.

> Considering self_nmi is called at another place, could we provide a
> way to temporarily suppress or (force) ignore unknown nmi error?

I'm afraid any attempt at doing so will leave room for missing an
actual (platform) NMI.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.