Xen project Mailing List

Re: [PATCH v3 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 27 Oct 2025 11:23:58 +0100

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Delivery-date: Mon, 27 Oct 2025 10:24:08 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24.10.2025 15:24, Roger Pau Monné wrote: > On Thu, Oct 23, 2025 at 05:50:17PM +0200, Jan Beulich wrote: >> Using dynamically allocated / maintained vectors has several downsides: >> - possible nesting of IRQs due to the effects of IRQ migration, >> - reduction of vectors available for devices, >> - IRQs not moving as intended if there's shortage of vectors, >> - higher runtime overhead. >> >> As the vector also doesn't need to be of any priority (first and foremost >> it really shouldn't be of higher or same priority as the timer IRQ, as >> that raises TIMER_SOFTIRQ anyway), simply use the lowest one above the >> legacy range. The vector needs reserving early, until it is known whether >> it actually is used. If it isn't, it's made available for general use. >> >> With a fixed vector, less updating is now necessary in >> set_channel_irq_affinity(); in particular channels don't need transiently >> masking anymore, as the necessary update is now atomic. To fully leverage >> this, however, we want to stop using hpet_msi_set_affinity() there. With >> the transient masking dropped, we're no longer at risk of missing events. >> >> Fixes: 996576b965cc ("xen: allow up to 16383 cpus") >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@xxxxxxxxx> > ^ space? Looks like I simply took what was provided; I've added the blank now (also in patch 1). > Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> Thanks. >> @@ -303,15 +309,13 @@ static void cf_check hpet_msi_set_affini >> struct hpet_event_channel *ch = desc->action->dev_id; >> struct msi_msg msg = ch->msi.msg; >> >> - msg.dest32 = set_desc_affinity(desc, mask); >> - if ( msg.dest32 == BAD_APICID ) >> - return; >> + /* This really is only for dump_irqs(). */ >> + cpumask_copy(desc->arch.cpu_mask, mask); > > To add some extra checks here for correctness, do you think it would > be helpful to add: > > ASSERT(cpumask_weight(mask) == 1); > ASSERT(cpumask_intersects(mask, &cpu_online_mask)); > > Or that's too pedantic? Imo that would be pretty pointless in particular since the function is to go away anyway. >> @@ -343,6 +347,12 @@ static int __init hpet_setup_msi_irq(str >> u32 cfg = hpet_read32(HPET_Tn_CFG(ch->idx)); >> irq_desc_t *desc = irq_to_desc(ch->msi.irq); >> >> + clear_irq_vector(ch->msi.irq); >> + ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, >> &cpu_online_map); > > By passing cpu_online_map here, it leads to _bind_irq_vector() doing: > > cpumask_copy(desc->arch.cpu_mask, &cpu_online_map); > > Which strictly speaking is wrong. However this is just a cosmetic > issue until the irq is used for the first time, at which point it will > be assigned to a concrete CPU. > > You could do: > > cpumask_clear(desc->arch.cpu_mask); > cpumask_set_cpu(cpumask_any(&cpu_online_map), desc->arch.cpu_mask); > > (Or equivalent) > > To assign the interrupt to a concrete CPU and reflex it on the > cpu_mask after the bind_irq_vector() call, but I can live with it > being like this. I have patches to adjust _bind_irq_vector() myself, > which I hope I will be able to post soon. Hmm, I wrongly memorized hpet_broadcast_init() as being pre-SMP-init only. It has three call sites: - mwait_idle_init(), called from cpuidle_presmp_init(), - amd_cpuidle_init(), calling in only when invoked the very first time, which is again from cpuidle_presmp_init(), - _disable_pit_irq(), called from the regular initcall disable_pit_irq(). I.e. for the latter you're right that the CPU mask is too broad (in only a cosmetic way though). Would be you okay if I used cpumask_of(0) in place of &cpu_online_map? >> @@ -472,19 +482,50 @@ static struct hpet_event_channel *hpet_g >> static void set_channel_irq_affinity(struct hpet_event_channel *ch) >> { >> struct irq_desc *desc = irq_to_desc(ch->msi.irq); >> + struct msi_msg msg = ch->msi.msg; >> >> ASSERT(!local_irq_is_enabled()); >> spin_lock(&desc->lock); >> - hpet_msi_mask(desc); >> - hpet_msi_set_affinity(desc, cpumask_of(ch->cpu)); >> - hpet_msi_unmask(desc); >> + >> + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq; >> + >> + /* >> + * Open-coding a reduced form of hpet_msi_set_affinity() here. With the >> + * actual update below (either of the IRTE or of [just] message address; >> + * with interrupt remapping message address/data don't change) now being >> + * atomic, we can avoid masking the IRQ around the update. As a result >> + * we're no longer at risk of missing IRQs (provided >> hpet_broadcast_enter() >> + * keeps setting the new deadline only afterwards). >> + */ >> + cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu)); >> + >> spin_unlock(&desc->lock); >> >> - spin_unlock(&ch->lock); >> + msg.dest32 = cpu_physical_id(ch->cpu); >> + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; >> + msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32); >> + if ( msg.dest32 != ch->msi.msg.dest32 ) >> + { >> + ch->msi.msg = msg; >> >> - /* We may have missed an interrupt due to the temporary masking. */ >> - if ( ch->event_handler && ch->next_event < NOW() ) >> - ch->event_handler(ch); >> + if ( iommu_intremap != iommu_intremap_off ) >> + { >> + int rc = iommu_update_ire_from_msi(&ch->msi, &msg); >> + >> + ASSERT(rc <= 0); >> + if ( rc >= 0 ) > > I don't think the rc > 0 part of this check is meaningful, as any rc > value > 0 will trigger the ASSERT(rc <= 0) ahead of it. The code > inside of the if block itself only contains ASSERTs, so it's only > relevant for debug=y builds that will also have the rc <= 0 ASSERT. > > You could possibly use: > > ASSERT(rc <= 0); > if ( !rc ) > { > ASSERT(... > > And achieve the same result? Yes, except that I'd like to keep the >= to cover the case if the first assertion was dropped / commented out, as well as to have a doc effect. >> @@ -991,6 +997,13 @@ void alloc_direct_apic_vector(uint8_t *v >> spin_unlock(&lock); >> } >> >> +/* This could free any vectors, but is needed only for low-prio ones. */ >> +void __init free_lopriority_vector(uint8_t vector) >> +{ >> + ASSERT(vector < FIRST_HIPRIORITY_VECTOR); >> + clear_bit(vector, used_vectors); >> +} > > I'm undecided whether we want to have such helper. This is all very > specific to the single use by the HPET vector, and hence might be best > to simply put the clear_bit() inside of hpet_broadcast_late_init() > itself. I wanted to avoid making used_vectors non-static. > I could see for example other callers wanting to use this also > requiring cleanup of the per cpu vector_irq arrays. Given it's (so > far) very limited usage it might be clearer to open-code the > clear_bit(). Dealing with vector_irq[] is a separate thing, though, isn't it? >> --- a/xen/drivers/passthrough/amd/iommu_intr.c >> +++ b/xen/drivers/passthrough/amd/iommu_intr.c >> @@ -551,6 +551,13 @@ int cf_check amd_iommu_msi_msg_update_ir >> for ( i = 1; i < nr; ++i ) >> msi_desc[i].remap_index = msi_desc->remap_index + i; >> msg->data = data; >> + /* >> + * While the low address bits don't matter, "canonicalize" the >> address >> + * by zapping the bits that were transferred to the IRTE. This way >> + * callers can check for there actually needing to be an update to >> + * wherever the address is put. >> + */ >> + msg->address_lo &= ~(MSI_ADDR_DESTMODE_MASK | >> MSI_ADDR_DEST_ID_MASK); > > You might want to mention this change on the commit message also, as > it could look unrelated to the rest of the code? I thought the comment here provided enough context and detail. I've added "AMD interrupt remapping code so far didn't "return" a consistent MSI address when translating an MSI message. Clear respective fields there, to keep the respective assertion in set_channel_irq_affinity() from triggering." Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.