Xen project Mailing List

Re: [PATCH v3 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 27 Oct 2025 12:33:36 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9x+48BY8Y19X6DkNc3fenIfDrpmoHJjRz71Yrta7NV4=; b=oBAtSEx/zI+bpf1XkJyxU4OBAu8VUUqDIA6rvI1WWtDUc0fZEBh5/V1xwSDGcUBx7Ae8CBNioA+CA8Ojv+EHtYysZQUgVrklv9BA3T5tf+hecv9kMnWfEeRq9DJB8Pv+NgKigL8npW5KHZQVS6aZEErUwMUBDbDyMwTH4V+yNh1xV9usXOq1Hxg8FIFVsLHqm2IAtb1qKPbiRWrk2EeZwkXxoyVAr8hXEyFQER0CHon1tyO/tACyk+zQ84jqajJYjOZRxzJ66rV/acOFmnpWWQizWbrbr/LFwlrmIdvpZRDGk6SFDFawyfjYE3WbvvQV4jLbDHP5mVnmVsJRmez7NQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=J2QD8twNkeNDoet19gln7qV+tPgMMzUhDfIDULTns0v/HPefj3BFnMwNYbGL7KDxiCn/5K9sqX7/kc7KCbj/YZ6dyIZo2s+A0/maoE0Iq7zdJN89qKCZKov8ST4FUN3Qz2Cv13MVxc/Fu4ZzN2mM0lRfSaOyN7eiqpZ42l9Wic8M2Mjzhc02CTyDosp4LQrit1I4wemqspgNgTGZKElYWFor+oaqkYKRpHrJ+GH7CrQePJMLU2fe6qRCWINRZpCMHHw1Wl1YNaiD3pM8217oWByv2c30EO+Cw+rJhujLnhIp+jyFRRE+gtrLkGvZtpBcHY85FzW5nY1nCFfXv+WumQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Delivery-date: Mon, 27 Oct 2025 11:33:54 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Oct 27, 2025 at 11:23:58AM +0100, Jan Beulich wrote: > On 24.10.2025 15:24, Roger Pau Monné wrote: > > On Thu, Oct 23, 2025 at 05:50:17PM +0200, Jan Beulich wrote: > >> @@ -343,6 +347,12 @@ static int __init hpet_setup_msi_irq(str > >> u32 cfg = hpet_read32(HPET_Tn_CFG(ch->idx)); > >> irq_desc_t *desc = irq_to_desc(ch->msi.irq); > >> > >> + clear_irq_vector(ch->msi.irq); > >> + ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, > >> &cpu_online_map); > > > > By passing cpu_online_map here, it leads to _bind_irq_vector() doing: > > > > cpumask_copy(desc->arch.cpu_mask, &cpu_online_map); > > > > Which strictly speaking is wrong. However this is just a cosmetic > > issue until the irq is used for the first time, at which point it will > > be assigned to a concrete CPU. > > > > You could do: > > > > cpumask_clear(desc->arch.cpu_mask); > > cpumask_set_cpu(cpumask_any(&cpu_online_map), desc->arch.cpu_mask); > > > > (Or equivalent) > > > > To assign the interrupt to a concrete CPU and reflex it on the > > cpu_mask after the bind_irq_vector() call, but I can live with it > > being like this. I have patches to adjust _bind_irq_vector() myself, > > which I hope I will be able to post soon. > > Hmm, I wrongly memorized hpet_broadcast_init() as being pre-SMP-init only. > It has three call sites: > - mwait_idle_init(), called from cpuidle_presmp_init(), > - amd_cpuidle_init(), calling in only when invoked the very first time, > which is again from cpuidle_presmp_init(), > - _disable_pit_irq(), called from the regular initcall disable_pit_irq(). > I.e. for the latter you're right that the CPU mask is too broad (in only a > cosmetic way though). Would be you okay if I used cpumask_of(0) in place > of &cpu_online_map? Using cpumask_of(0) would be OK, as the per-cpu vector_irq array will be updated ahead of assigning the interrupt to a CPU, and hence it doesn't need to be done for all possible online CPUs in _bind_irq_vector(). In the context here it would be more accurate to provide an empty CPU mask, as the interrupt is not yet targeting any CPU. Using CPU 0 would be a placeholder, which seems fine for the purpose. > >> @@ -472,19 +482,50 @@ static struct hpet_event_channel *hpet_g > >> static void set_channel_irq_affinity(struct hpet_event_channel *ch) > >> { > >> struct irq_desc *desc = irq_to_desc(ch->msi.irq); > >> + struct msi_msg msg = ch->msi.msg; > >> > >> ASSERT(!local_irq_is_enabled()); > >> spin_lock(&desc->lock); > >> - hpet_msi_mask(desc); > >> - hpet_msi_set_affinity(desc, cpumask_of(ch->cpu)); > >> - hpet_msi_unmask(desc); > >> + > >> + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq; > >> + > >> + /* > >> + * Open-coding a reduced form of hpet_msi_set_affinity() here. With > >> the > >> + * actual update below (either of the IRTE or of [just] message > >> address; > >> + * with interrupt remapping message address/data don't change) now > >> being > >> + * atomic, we can avoid masking the IRQ around the update. As a > >> result > >> + * we're no longer at risk of missing IRQs (provided > >> hpet_broadcast_enter() > >> + * keeps setting the new deadline only afterwards). > >> + */ > >> + cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu)); > >> + > >> spin_unlock(&desc->lock); > >> > >> - spin_unlock(&ch->lock); > >> + msg.dest32 = cpu_physical_id(ch->cpu); > >> + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; > >> + msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32); > >> + if ( msg.dest32 != ch->msi.msg.dest32 ) > >> + { > >> + ch->msi.msg = msg; > >> > >> - /* We may have missed an interrupt due to the temporary masking. */ > >> - if ( ch->event_handler && ch->next_event < NOW() ) > >> - ch->event_handler(ch); > >> + if ( iommu_intremap != iommu_intremap_off ) > >> + { > >> + int rc = iommu_update_ire_from_msi(&ch->msi, &msg); > >> + > >> + ASSERT(rc <= 0); > >> + if ( rc >= 0 ) > > > > I don't think the rc > 0 part of this check is meaningful, as any rc > > value > 0 will trigger the ASSERT(rc <= 0) ahead of it. The code > > inside of the if block itself only contains ASSERTs, so it's only > > relevant for debug=y builds that will also have the rc <= 0 ASSERT. > > > > You could possibly use: > > > > ASSERT(rc <= 0); > > if ( !rc ) > > { > > ASSERT(... > > > > And achieve the same result? > > Yes, except that I'd like to keep the >= to cover the case if the first > assertion was dropped / commented out, as well as to have a doc effect. Oh, OK. Fair enough, I wasn't taking into account that this could be done in case code is modified. > >> @@ -991,6 +997,13 @@ void alloc_direct_apic_vector(uint8_t *v > >> spin_unlock(&lock); > >> } > >> > >> +/* This could free any vectors, but is needed only for low-prio ones. */ > >> +void __init free_lopriority_vector(uint8_t vector) > >> +{ > >> + ASSERT(vector < FIRST_HIPRIORITY_VECTOR); > >> + clear_bit(vector, used_vectors); > >> +} > > > > I'm undecided whether we want to have such helper. This is all very > > specific to the single use by the HPET vector, and hence might be best > > to simply put the clear_bit() inside of hpet_broadcast_late_init() > > itself. > > I wanted to avoid making used_vectors non-static. > > > I could see for example other callers wanting to use this also > > requiring cleanup of the per cpu vector_irq arrays. Given it's (so > > far) very limited usage it might be clearer to open-code the > > clear_bit(). > > Dealing with vector_irq[] is a separate thing, though, isn't it? Possibly, that's part of the binding, rather than the allocation itself (which is what you cover here). > >> --- a/xen/drivers/passthrough/amd/iommu_intr.c > >> +++ b/xen/drivers/passthrough/amd/iommu_intr.c > >> @@ -551,6 +551,13 @@ int cf_check amd_iommu_msi_msg_update_ir > >> for ( i = 1; i < nr; ++i ) > >> msi_desc[i].remap_index = msi_desc->remap_index + i; > >> msg->data = data; > >> + /* > >> + * While the low address bits don't matter, "canonicalize" the > >> address > >> + * by zapping the bits that were transferred to the IRTE. This > >> way > >> + * callers can check for there actually needing to be an update to > >> + * wherever the address is put. > >> + */ > >> + msg->address_lo &= ~(MSI_ADDR_DESTMODE_MASK | > >> MSI_ADDR_DEST_ID_MASK); > > > > You might want to mention this change on the commit message also, as > > it could look unrelated to the rest of the code? > > I thought the comment here provided enough context and detail. I've added > "AMD interrupt remapping code so far didn't "return" a consistent MSI > address when translating an MSI message. Clear respective fields there, to > keep the respective assertion in set_channel_irq_affinity() from > triggering." LGTM, I would possibly remove the last "respective" for being repetitive given the previous one in the sentence. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.