Xen project Mailing List

Re: [PATCH v2 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 22 Oct 2025 11:21:15 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Delivery-date: Wed, 22 Oct 2025 09:21:27 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.10.2025 15:49, Roger Pau Monné wrote: > On Tue, Oct 21, 2025 at 08:42:13AM +0200, Jan Beulich wrote: >> On 20.10.2025 18:22, Roger Pau Monné wrote: >>> On Mon, Oct 20, 2025 at 01:18:34PM +0200, Jan Beulich wrote: >>>> Using dynamically allocated / maintained vectors has several downsides: >>>> - possible nesting of IRQs due to the effects of IRQ migration, >>>> - reduction of vectors available for devices, >>>> - IRQs not moving as intended if there's shortage of vectors, >>>> - higher runtime overhead. >>>> >>>> As the vector also doesn't need to be of any priority (first and foremost >>>> it really shouldn't be of higher or same priority as the timer IRQ, as >>>> that raises TIMER_SOFTIRQ anyway), avoid any "ordinary" vectors altogther >>>> and use a vector from the 0x10...0x1f exception vector space. Exception vs >>>> interrupt can easily be distinguished by checking for the presence of an >>>> error code. >>>> >>>> With a fixed vector, less updating is now necessary in >>>> set_channel_irq_affinity(); in particular channels don't need transiently >>>> masking anymore, as the necessary update is now atomic. To fully leverage >>>> this, however, we want to stop using hpet_msi_set_affinity() there. With >>>> the transient masking dropped, we're no longer at risk of missing events. >>>> >>>> In principle a change to setup_vector_irq() would be necessary, but only >>>> if we used low-prio vectors as direct-APIC ones. Since the change would be >>>> at best benign here, it is being omitted. >>>> >>>> Fixes: 996576b965cc ("xen: allow up to 16383 cpus") >>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>>> Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@xxxxxxxxx> >>>> --- >>>> This is an alternative proposal to >>>> https://lists.xen.org/archives/html/xen-devel/2014-03/msg00399.html. >>>> >>>> Should we keep hpet_msi_set_affinity() at all? We'd better not have the >>>> generic IRQ subsystem play with our IRQs' affinities ... (If so, this >>>> likely would want to be a separate patch, though.) >>> >>> I think that needs to become a no-op, with possibly an ASSERT? Is it >>> possibly for dom0 to try to balance this IRQ? I would think not. >> >> I'd consider it an error if that was possible. But then the same goes for >> other Xen-internal IRQs, like the IOMMU ones. They all implement a >> .set_affinity hook ... > > We need such hook for fixup_irqs() at least, so that interrupts can be > evacuated when an AP goes offline. Hmm, yes. Just not here. >>>> @@ -476,19 +486,50 @@ static struct hpet_event_channel *hpet_g >>>> static void set_channel_irq_affinity(struct hpet_event_channel *ch) >>>> { >>>> struct irq_desc *desc = irq_to_desc(ch->msi.irq); >>>> + struct msi_msg msg = ch->msi.msg; >>>> >>>> ASSERT(!local_irq_is_enabled()); >>>> spin_lock(&desc->lock); >>>> - hpet_msi_mask(desc); >>>> - hpet_msi_set_affinity(desc, cpumask_of(ch->cpu)); >>>> - hpet_msi_unmask(desc); >>>> + >>>> + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq; >>>> + >>>> + /* >>>> + * Open-coding a reduced form of hpet_msi_set_affinity() here. With >>>> the >>>> + * actual update below (either of the IRTE or of [just] message >>>> address; >>>> + * with interrupt remapping message address/data don't change) now >>>> being >>>> + * atomic, we can avoid masking the IRQ around the update. As a >>>> result >>>> + * we're no longer at risk of missing IRQs (provided >>>> hpet_broadcast_enter() >>>> + * keeps setting the new deadline only afterwards). >>>> + */ >>>> + cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu)); >>>> + >>>> spin_unlock(&desc->lock); >>>> >>>> - spin_unlock(&ch->lock); >>>> + msg.dest32 = cpu_physical_id(ch->cpu); >>>> + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; >>>> + msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32); >>>> + if ( msg.dest32 != ch->msi.msg.dest32 ) >>>> + { >>>> + ch->msi.msg = msg; >>>> + >>>> + if ( iommu_intremap != iommu_intremap_off ) >>>> + { >>>> + int rc = iommu_update_ire_from_msi(&ch->msi, &msg); >>>> >>>> - /* We may have missed an interrupt due to the temporary masking. */ >>>> - if ( ch->event_handler && ch->next_event < NOW() ) >>>> - ch->event_handler(ch); >>>> + ASSERT(rc <= 0); >>>> + if ( rc > 0 ) >>>> + { >>>> + ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx))); >>>> + ASSERT(msg.address_lo == >>>> + hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4)); >>>> + } >>> >>> The sequence of asserts seem wrong here, the asserts inside of the rc >>>> 0 check will never trigger, because there's an ASSERT(rc <= 0) >>> ahead of them? >> >> Hmm. My way of thinking was that if we get back 1 (which we shouldn't), >> we ought to check (and presumably fail on) data or address having changed. > > Right, but the ASSERT(rc <= 0) will prevent reaching any of the > followup ASSERTs if rc == 1? Which is no problem, as we'd be dead already anyway if the first assertion triggered. Nevertheless I've switched the if() to >= 0 (which then pointed out a necessary change in AMD IOMMU code). > IOW, we possibly want: > > if ( rc > 0 ) > { > dprintk(XENLOG_ERR, > "Unexpected HPET MSI setup returned: data: %#x > address: %#lx expected data %#x address %#lx\n", > msg.data, msg.address, > ch->msi.msg.data, ch->msi.msg.address); > ASSERT_UNREACHABLE(); > hpet_msi_mask(desc); > hpet_write32(msg.data, HPET_Tn_ROUTE(ch->idx)); > hpet_write32(msg.address_lo, HPET_Tn_ROUTE(ch->idx) + 4); > hpet_msi_unmask(desc); > } > ASSERT(!rc); To be honest, for my taste this goes too far as to what follows an ASSERT_UNREACHABLE(). > I'm unsure about attempting to propagate the returned values on release > builds, I guess it's slightly better than possibly using an outdated > RTE entry? Albeit this should never happen. Yes to the last remark; I don't actually see what you would want to do with the propagated return value. > Also, should the desc->arch.cpu_mask update only be done after the MSI > fields have correctly updated, so that in case of failure of > iommu_update_ire_from_msi(9 we could return early form the function > and avoid changing desc->arch.cpu_mask? Hmm, yes, I could do that, but then also in hpet_msi_set_affinity(). However, as this needs doing under the IRQ descriptor lock, I'd have to either extend the locked region here (again), or re-acquire the lock later. Neither looks very attractive to me. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.