|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 6/7] x86/irq: handle moving interrupts in _assign_irq_vector()
On 12.06.2024 17:36, Roger Pau Monné wrote:
> On Wed, Jun 12, 2024 at 03:42:58PM +0200, Jan Beulich wrote:
>> On 12.06.2024 12:39, Roger Pau Monné wrote:
>>> On Tue, Jun 11, 2024 at 03:18:32PM +0200, Jan Beulich wrote:
>>>> On 10.06.2024 16:20, Roger Pau Monne wrote:
>>>>> Currently there's logic in fixup_irqs() that attempts to prevent
>>>>> _assign_irq_vector() from failing, as fixup_irqs() is required to
>>>>> evacuate all
>>>>> interrupts from the CPUs not present in the input mask. The current
>>>>> logic in
>>>>> fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
>>>>> move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.
>>>>>
>>>>> Instead of attempting to fixup the interrupt descriptor in fixup_irqs()
>>>>> so that
>>>>> _assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector()
>>>>> to deal with interrupts that have either move_{in_progress,cleanup_count}
>>>>> set
>>>>> and no remaining online CPUs in ->arch.cpu_mask.
>>>>>
>>>>> If _assign_irq_vector() is requested to move an interrupt in the state
>>>>> described above, first attempt to see if ->arch.old_cpu_mask contains any
>>>>> valid
>>>>> CPUs that could be used as fallback, and if that's the case do move the
>>>>> interrupt back to the previous destination. Note this is easier because
>>>>> the
>>>>> vector hasn't been released yet, so there's no need to allocate and setup
>>>>> a new
>>>>> vector on the destination.
>>>>>
>>>>> Due to the logic in fixup_irqs() that clears offline CPUs from
>>>>> ->arch.old_cpu_mask (and releases the old vector if the mask becomes
>>>>> empty) it
>>>>> shouldn't be possible to get into _assign_irq_vector() with
>>>>> ->arch.move_{in_progress,cleanup_count} set but no online CPUs in
>>>>> ->arch.old_cpu_mask.
>>>>>
>>>>> However if ->arch.move_{in_progress,cleanup_count} is set and the
>>>>> interrupt has
>>>>> also changed affinity, it's possible the members of ->arch.old_cpu_mask
>>>>> are no
>>>>> longer part of the affinity set,
>>>>
>>>> I'm having trouble relating this (->arch.old_cpu_mask related) to ...
>>>>
>>>>> move the interrupt to a different CPU part of
>>>>> the provided mask
>>>>
>>>> ... this (->arch.cpu_mask related).
>>>
>>> No, the "provided mask" here is the "mask" parameter, not
>>> ->arch.cpu_mask.
>>
>> Oh, so this describes the case of "hitting" the comment at the very bottom of
>> the first hunk then? (I probably was misreading this because I was expecting
>> it to describe a code change, rather than the case where original behavior
>> needs retaining. IOW - all fine here then.)
>>
>>>>> and keep the current ->arch.old_{cpu_mask,vector} for the
>>>>> pending interrupt movement to be completed.
>>>>
>>>> Right, that's to clean up state from before the initial move. What isn't
>>>> clear to me is what's to happen with the state of the intermediate
>>>> placement. Description and code changes leave me with the impression that
>>>> it's okay to simply abandon, without any cleanup, yet I can't quite figure
>>>> why that would be an okay thing to do.
>>>
>>> There isn't much we can do with the intermediate placement, as the CPU
>>> is going offline. However we can drain any pending interrupts from
>>> IRR after the new destination has been set, since setting the
>>> destination is done from the CPU that's the current target of the
>>> interrupts. So we can ensure the draining is done strictly after the
>>> target has been switched, hence ensuring no further interrupts from
>>> this source will be delivered to the current CPU.
>>
>> Hmm, I'm afraid I still don't follow: I'm specifically in trouble with
>> the ...
>>
>>>>> --- a/xen/arch/x86/irq.c
>>>>> +++ b/xen/arch/x86/irq.c
>>>>> @@ -544,7 +544,53 @@ static int _assign_irq_vector(struct irq_desc *desc,
>>>>> const cpumask_t *mask)
>>>>> }
>>>>>
>>>>> if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
>>>>> - return -EAGAIN;
>>>>> + {
>>>>> + /*
>>>>> + * If the current destination is online refuse to shuffle.
>>>>> Retry after
>>>>> + * the in-progress movement has finished.
>>>>> + */
>>>>> + if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) )
>>>>> + return -EAGAIN;
>>>>> +
>>>>> + /*
>>>>> + * Due to the logic in fixup_irqs() that clears offlined CPUs
>>>>> from
>>>>> + * ->arch.old_cpu_mask it shouldn't be possible to get here with
>>>>> + * ->arch.move_{in_progress,cleanup_count} set and no online
>>>>> CPUs in
>>>>> + * ->arch.old_cpu_mask.
>>>>> + */
>>>>> + ASSERT(valid_irq_vector(desc->arch.old_vector));
>>>>> + ASSERT(cpumask_intersects(desc->arch.old_cpu_mask,
>>>>> &cpu_online_map));
>>>>> +
>>>>> + if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) )
>>>>> + {
>>>>> + /*
>>>>> + * Fallback to the old destination if moving is in progress
>>>>> and the
>>>>> + * current destination is to be offlined. This is only
>>>>> possible if
>>>>> + * the CPUs in old_cpu_mask intersect with the affinity mask
>>>>> passed
>>>>> + * in the 'mask' parameter.
>>>>> + */
>>>>> + desc->arch.vector = desc->arch.old_vector;
>>>>> + cpumask_and(desc->arch.cpu_mask, desc->arch.old_cpu_mask,
>>>>> mask);
>>
>> ... replacing of vector (and associated mask), without any further
>> accounting.
>
> It's quite likely I'm missing something here, but what further
> accounting you would like to do?
>
> The current target of the interrupt (->arch.cpu_mask previous to
> cpumask_and()) is all going offline, so any attempt to set it in
> ->arch.old_cpu_mask would just result in a stale (offline) CPU getting
> set in ->arch.old_cpu_mask, which previous patches attempted to
> solve.
>
> Maybe by "further accounting" you meant something else not related to
> ->arch.old_{cpu_mask,vector}?
Indeed. What I'm thinking of is what normally release_old_vec() would
do (of which only desc->arch.used_vectors updating would appear to be
relevant, seeing the CPU's going offline). The other one I was thinking
of, updating vector_irq[], likely is also unnecessary, again because
that's per-CPU data of a CPU going down.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |