|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ
On 21.10.2025 15:49, Roger Pau Monné wrote:
> On Tue, Oct 21, 2025 at 08:42:13AM +0200, Jan Beulich wrote:
>> On 20.10.2025 18:22, Roger Pau Monné wrote:
>>> On Mon, Oct 20, 2025 at 01:18:34PM +0200, Jan Beulich wrote:
>>>> Using dynamically allocated / maintained vectors has several downsides:
>>>> - possible nesting of IRQs due to the effects of IRQ migration,
>>>> - reduction of vectors available for devices,
>>>> - IRQs not moving as intended if there's shortage of vectors,
>>>> - higher runtime overhead.
>>>>
>>>> As the vector also doesn't need to be of any priority (first and foremost
>>>> it really shouldn't be of higher or same priority as the timer IRQ, as
>>>> that raises TIMER_SOFTIRQ anyway), avoid any "ordinary" vectors altogther
>>>> and use a vector from the 0x10...0x1f exception vector space. Exception vs
>>>> interrupt can easily be distinguished by checking for the presence of an
>>>> error code.
>>>>
>>>> With a fixed vector, less updating is now necessary in
>>>> set_channel_irq_affinity(); in particular channels don't need transiently
>>>> masking anymore, as the necessary update is now atomic. To fully leverage
>>>> this, however, we want to stop using hpet_msi_set_affinity() there. With
>>>> the transient masking dropped, we're no longer at risk of missing events.
>>>>
>>>> In principle a change to setup_vector_irq() would be necessary, but only
>>>> if we used low-prio vectors as direct-APIC ones. Since the change would be
>>>> at best benign here, it is being omitted.
>>>>
>>>> Fixes: 996576b965cc ("xen: allow up to 16383 cpus")
>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>> Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@xxxxxxxxx>
>>>> ---
>>>> This is an alternative proposal to
>>>> https://lists.xen.org/archives/html/xen-devel/2014-03/msg00399.html.
>>>>
>>>> Should we keep hpet_msi_set_affinity() at all? We'd better not have the
>>>> generic IRQ subsystem play with our IRQs' affinities ... (If so, this
>>>> likely would want to be a separate patch, though.)
>>>
>>> I think that needs to become a no-op, with possibly an ASSERT? Is it
>>> possibly for dom0 to try to balance this IRQ? I would think not.
>>
>> I'd consider it an error if that was possible. But then the same goes for
>> other Xen-internal IRQs, like the IOMMU ones. They all implement a
>> .set_affinity hook ...
>
> We need such hook for fixup_irqs() at least, so that interrupts can be
> evacuated when an AP goes offline.
Hmm, yes. Just not here.
>>>> @@ -476,19 +486,50 @@ static struct hpet_event_channel *hpet_g
>>>> static void set_channel_irq_affinity(struct hpet_event_channel *ch)
>>>> {
>>>> struct irq_desc *desc = irq_to_desc(ch->msi.irq);
>>>> + struct msi_msg msg = ch->msi.msg;
>>>>
>>>> ASSERT(!local_irq_is_enabled());
>>>> spin_lock(&desc->lock);
>>>> - hpet_msi_mask(desc);
>>>> - hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
>>>> - hpet_msi_unmask(desc);
>>>> +
>>>> + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
>>>> +
>>>> + /*
>>>> + * Open-coding a reduced form of hpet_msi_set_affinity() here. With
>>>> the
>>>> + * actual update below (either of the IRTE or of [just] message
>>>> address;
>>>> + * with interrupt remapping message address/data don't change) now
>>>> being
>>>> + * atomic, we can avoid masking the IRQ around the update. As a
>>>> result
>>>> + * we're no longer at risk of missing IRQs (provided
>>>> hpet_broadcast_enter()
>>>> + * keeps setting the new deadline only afterwards).
>>>> + */
>>>> + cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu));
>>>> +
>>>> spin_unlock(&desc->lock);
>>>>
>>>> - spin_unlock(&ch->lock);
>>>> + msg.dest32 = cpu_physical_id(ch->cpu);
>>>> + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
>>>> + msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
>>>> + if ( msg.dest32 != ch->msi.msg.dest32 )
>>>> + {
>>>> + ch->msi.msg = msg;
>>>> +
>>>> + if ( iommu_intremap != iommu_intremap_off )
>>>> + {
>>>> + int rc = iommu_update_ire_from_msi(&ch->msi, &msg);
>>>>
>>>> - /* We may have missed an interrupt due to the temporary masking. */
>>>> - if ( ch->event_handler && ch->next_event < NOW() )
>>>> - ch->event_handler(ch);
>>>> + ASSERT(rc <= 0);
>>>> + if ( rc > 0 )
>>>> + {
>>>> + ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx)));
>>>> + ASSERT(msg.address_lo ==
>>>> + hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4));
>>>> + }
>>>
>>> The sequence of asserts seem wrong here, the asserts inside of the rc
>>>> 0 check will never trigger, because there's an ASSERT(rc <= 0)
>>> ahead of them?
>>
>> Hmm. My way of thinking was that if we get back 1 (which we shouldn't),
>> we ought to check (and presumably fail on) data or address having changed.
>
> Right, but the ASSERT(rc <= 0) will prevent reaching any of the
> followup ASSERTs if rc == 1?
Which is no problem, as we'd be dead already anyway if the first assertion
triggered. Nevertheless I've switched the if() to >= 0 (which then pointed
out a necessary change in AMD IOMMU code).
> IOW, we possibly want:
>
> if ( rc > 0 )
> {
> dprintk(XENLOG_ERR,
> "Unexpected HPET MSI setup returned: data: %#x
> address: %#lx expected data %#x address %#lx\n",
> msg.data, msg.address,
> ch->msi.msg.data, ch->msi.msg.address);
> ASSERT_UNREACHABLE();
> hpet_msi_mask(desc);
> hpet_write32(msg.data, HPET_Tn_ROUTE(ch->idx));
> hpet_write32(msg.address_lo, HPET_Tn_ROUTE(ch->idx) + 4);
> hpet_msi_unmask(desc);
> }
> ASSERT(!rc);
To be honest, for my taste this goes too far as to what follows an
ASSERT_UNREACHABLE().
> I'm unsure about attempting to propagate the returned values on release
> builds, I guess it's slightly better than possibly using an outdated
> RTE entry? Albeit this should never happen.
Yes to the last remark; I don't actually see what you would want to do
with the propagated return value.
> Also, should the desc->arch.cpu_mask update only be done after the MSI
> fields have correctly updated, so that in case of failure of
> iommu_update_ire_from_msi(9 we could return early form the function
> and avoid changing desc->arch.cpu_mask?
Hmm, yes, I could do that, but then also in hpet_msi_set_affinity().
However, as this needs doing under the IRQ descriptor lock, I'd have to
either extend the locked region here (again), or re-acquire the lock
later. Neither looks very attractive to me.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |