[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 23 Oct 2025 10:39:26 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=19eBYxrnfoAkcua0jE/me9wgh/Gl7m3Gny6ARk2Zz5U=; b=MH9EVcmEu/c0xB1aZ/BleZAmJWSWYJYoQxG4Gf7Sx/rF09zI/fR80Ok2yaDDb2sdEyv26BmNMtl09i+nUSItgco6ZcBb3fSnO3oWKwHNxQ0CpQHevIReuKB2kRMQrfYGiR7CFq66INL4IZifuWcGSbUHHn7qubcvDMJSg1QUVP1hpsfPjJjLW9WMciwIxeyolVtzhv+8iK6Hv22fJBIZ8J5+hiAiUCyWkOFxgY70XaTDqh94TRjR0CmT8fU5bld/NZJGf8s4zdOXODwM9NR1TrNTv2jmkccHwtlby/AZSYhRyPRKYT3eeNG2/pkw3TPwEx9s1RdXKZ1GWi9hkq1V6g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=hcCJ0MnotFlKRV8oyewUl+PmVPogLVH5DjC2UOkfjuB6X6udpeHlGRqFg/0z9dtU4KaI20FDfb8MdubCvj064b71xf9s6pOcmge3gxd/TC4rZQOWmFa5oT5eljRx/elP4MuU2ybGh1e+LuGG6QqYzQ973aoi5TdqhCYW3LYnaMlxcI4Gkk4dnnNceO3EbITeW1rsI4LoOUrGUsz38xCFxAoChqNjJGrkXEUqNHYY2VewW1nWQzwRqnSnKv2CsN5Laa9WKfAMq/wOpa4j/BAHa/Qlmj78zSQmOk/sc3OCUwuuJ6MRLe7vVAG9lXVHb3+PdYMKH1yGrQ3vxo4dAmT+Uw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • Delivery-date: Thu, 23 Oct 2025 08:39:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Oct 22, 2025 at 11:21:15AM +0200, Jan Beulich wrote:
> On 21.10.2025 15:49, Roger Pau Monné wrote:
> > On Tue, Oct 21, 2025 at 08:42:13AM +0200, Jan Beulich wrote:
> >> On 20.10.2025 18:22, Roger Pau Monné wrote:
> >>> On Mon, Oct 20, 2025 at 01:18:34PM +0200, Jan Beulich wrote:
> >>>> Using dynamically allocated / maintained vectors has several downsides:
> >>>> - possible nesting of IRQs due to the effects of IRQ migration,
> >>>> - reduction of vectors available for devices,
> >>>> - IRQs not moving as intended if there's shortage of vectors,
> >>>> - higher runtime overhead.
> >>>>
> >>>> As the vector also doesn't need to be of any priority (first and foremost
> >>>> it really shouldn't be of higher or same priority as the timer IRQ, as
> >>>> that raises TIMER_SOFTIRQ anyway), avoid any "ordinary" vectors altogther
> >>>> and use a vector from the 0x10...0x1f exception vector space. Exception 
> >>>> vs
> >>>> interrupt can easily be distinguished by checking for the presence of an
> >>>> error code.
> >>>>
> >>>> With a fixed vector, less updating is now necessary in
> >>>> set_channel_irq_affinity(); in particular channels don't need transiently
> >>>> masking anymore, as the necessary update is now atomic. To fully leverage
> >>>> this, however, we want to stop using hpet_msi_set_affinity() there. With
> >>>> the transient masking dropped, we're no longer at risk of missing events.
> >>>>
> >>>> In principle a change to setup_vector_irq() would be necessary, but only
> >>>> if we used low-prio vectors as direct-APIC ones. Since the change would 
> >>>> be
> >>>> at best benign here, it is being omitted.
> >>>>
> >>>> Fixes: 996576b965cc ("xen: allow up to 16383 cpus")
> >>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>> Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@xxxxxxxxx>
> >>>> ---
> >>>> This is an alternative proposal to
> >>>> https://lists.xen.org/archives/html/xen-devel/2014-03/msg00399.html.
> >>>>
> >>>> Should we keep hpet_msi_set_affinity() at all? We'd better not have the
> >>>> generic IRQ subsystem play with our IRQs' affinities ... (If so, this
> >>>> likely would want to be a separate patch, though.)
> >>>
> >>> I think that needs to become a no-op, with possibly an ASSERT?  Is it
> >>> possibly for dom0 to try to balance this IRQ?  I would think not.
> >>
> >> I'd consider it an error if that was possible. But then the same goes for
> >> other Xen-internal IRQs, like the IOMMU ones. They all implement a
> >> .set_affinity hook ...
> > 
> > We need such hook for fixup_irqs() at least, so that interrupts can be
> > evacuated when an AP goes offline.
> 
> Hmm, yes. Just not here.
> 
> >>>> @@ -476,19 +486,50 @@ static struct hpet_event_channel *hpet_g
> >>>>  static void set_channel_irq_affinity(struct hpet_event_channel *ch)
> >>>>  {
> >>>>      struct irq_desc *desc = irq_to_desc(ch->msi.irq);
> >>>> +    struct msi_msg msg = ch->msi.msg;
> >>>>  
> >>>>      ASSERT(!local_irq_is_enabled());
> >>>>      spin_lock(&desc->lock);
> >>>> -    hpet_msi_mask(desc);
> >>>> -    hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
> >>>> -    hpet_msi_unmask(desc);
> >>>> +
> >>>> +    per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
> >>>> +
> >>>> +    /*
> >>>> +     * Open-coding a reduced form of hpet_msi_set_affinity() here.  
> >>>> With the
> >>>> +     * actual update below (either of the IRTE or of [just] message 
> >>>> address;
> >>>> +     * with interrupt remapping message address/data don't change) now 
> >>>> being
> >>>> +     * atomic, we can avoid masking the IRQ around the update.  As a 
> >>>> result
> >>>> +     * we're no longer at risk of missing IRQs (provided 
> >>>> hpet_broadcast_enter()
> >>>> +     * keeps setting the new deadline only afterwards).
> >>>> +     */
> >>>> +    cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu));
> >>>> +
> >>>>      spin_unlock(&desc->lock);
> >>>>  
> >>>> -    spin_unlock(&ch->lock);
> >>>> +    msg.dest32 = cpu_physical_id(ch->cpu);
> >>>> +    msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
> >>>> +    msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
> >>>> +    if ( msg.dest32 != ch->msi.msg.dest32 )
> >>>> +    {
> >>>> +        ch->msi.msg = msg;
> >>>> +
> >>>> +        if ( iommu_intremap != iommu_intremap_off )
> >>>> +        {
> >>>> +            int rc = iommu_update_ire_from_msi(&ch->msi, &msg);
> >>>>  
> >>>> -    /* We may have missed an interrupt due to the temporary masking. */
> >>>> -    if ( ch->event_handler && ch->next_event < NOW() )
> >>>> -        ch->event_handler(ch);
> >>>> +            ASSERT(rc <= 0);
> >>>> +            if ( rc > 0 )
> >>>> +            {
> >>>> +                ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx)));
> >>>> +                ASSERT(msg.address_lo ==
> >>>> +                       hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4));
> >>>> +            }
> >>>
> >>> The sequence of asserts seem wrong here, the asserts inside of the rc
> >>>> 0 check will never trigger, because there's an ASSERT(rc <= 0)
> >>> ahead of them?
> >>
> >> Hmm. My way of thinking was that if we get back 1 (which we shouldn't),
> >> we ought to check (and presumably fail on) data or address having changed.
> > 
> > Right, but the ASSERT(rc <= 0) will prevent reaching any of the
> > followup ASSERTs if rc == 1?
> 
> Which is no problem, as we'd be dead already anyway if the first assertion
> triggered. Nevertheless I've switched the if() to >= 0 (which then pointed
> out a necessary change in AMD IOMMU code).

Right, so and adjusted if condition plus an ASSERT_UNREACHABLE() at
the end of the if code block?

> >  IOW, we possibly want:
> > 
> >             if ( rc > 0 )
> >             {
> >                 dprintk(XENLOG_ERR,
> >                         "Unexpected HPET MSI setup returned: data: %#x 
> > address: %#lx expected data %#x address %#lx\n",
> >                         msg.data, msg.address,
> >                         ch->msi.msg.data, ch->msi.msg.address);
> >                 ASSERT_UNREACHABLE();
> >                 hpet_msi_mask(desc);
> >                 hpet_write32(msg.data, HPET_Tn_ROUTE(ch->idx));
> >                 hpet_write32(msg.address_lo, HPET_Tn_ROUTE(ch->idx) + 4);
> >                 hpet_msi_unmask(desc);
> >             }
> >             ASSERT(!rc);
> 
> To be honest, for my taste this goes too far as to what follows an
> ASSERT_UNREACHABLE().

I can understand that.  It's the best way I've come up with attempting
to recover from a possible error in the release case, but I don't
particularly like it either.

> > I'm unsure about attempting to propagate the returned values on release
> > builds, I guess it's slightly better than possibly using an outdated
> > RTE entry?  Albeit this should never happen.
> 
> Yes to the last remark; I don't actually see what you would want to do
> with the propagated return value.

OK, I can this this not being clear.  By propagate here I mean
propagate to the hardware registers, not to the function caller.

> > Also, should the desc->arch.cpu_mask update only be done after the MSI
> > fields have correctly updated, so that in case of failure of
> > iommu_update_ire_from_msi(9 we could return early form the function
> > and avoid changing desc->arch.cpu_mask?
> 
> Hmm, yes, I could do that, but then also in hpet_msi_set_affinity().
> However, as this needs doing under the IRQ descriptor lock, I'd have to
> either extend the locked region here (again), or re-acquire the lock
> later. Neither looks very attractive to me.

Hm, I guess given the point in the release we can leave it as-is.  It
would be nice, but this change is big enough as it is.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.