[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask



On 18/02/2020 11:28, Jan Beulich wrote:
> On 18.02.2020 12:21, Andrew Cooper wrote:
>> On 18/02/2020 11:10, Roger Pau Monné wrote:
>>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
>>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, 
>>>>> int vector,
>>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>>>  {
>>>>>      bool cpus_locked = false;
>>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>>>> +    unsigned long flags;
>>>>> +
>>>>> +    if ( in_mc() || in_nmi() )
>>>>> +    {
>>>>> +        /*
>>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU 
>>>>> scratch mask
>>>>> +         * because we have no way to avoid reentry, so do not use the 
>>>>> APIC
>>>>> +         * shorthand.
>>>>> +         */
>>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>> +        return;
>>>> The set of things you can safely do in an NMI/MCE handler is small, and
>>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
>>>> is safe to send an IPI because there is no risk of clobbering ICR2
>>>> behind your outer context's back).
>>>>
>>>> However, if we escalate from NMI/MCE context into crash context, then
>>>> anything goes.  In reality, we only ever send NMIs from the crash path,
>>>> and that is not permitted to use a shorthand, making this code dead.
>>> This was requested by Jan, as safety measure
>> That may be, but it doesn't mean it is correct.  If execution ever
>> enters this function in NMI/MCE context, there is a real,
>> state-corrupting bug, higher up the call stack.
> Besides the issue of any locks needing taking on such paths (which
> must not happen in NMI/#MC context), the only thing getting in the
> way of IPI sending is - afaics - ICR2, which could be saved /
> restored around such operations.

Its the important xAPIC register for sure, but you've also got to
account for compound effects such as causing an LAPIC error.

It is far easier to say "thou shalt not IPI from NMI/MCE context",
because we don't have code needing to do this in the first place.

> That said, BUG()ing or panic()ing
> if we get in here from such a context would also be sufficient to
> satisfy the "safety measure" aspect.

No - safety checks in the crash path make it worse, because if they
trigger, they reliably trigger recursively and never enter the crash kernel.

Once we are in crash context, the most important task is to successfully
transition to the crash kernel.  Sure - there is no guarantee that we
will manage it, but hitting poorly-thought-through safety checks really
has wasted months of customer (and my) time during investigations.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.