|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ
On Tue, Oct 21, 2025 at 08:42:13AM +0200, Jan Beulich wrote:
> On 20.10.2025 18:22, Roger Pau Monné wrote:
> > On Mon, Oct 20, 2025 at 01:18:34PM +0200, Jan Beulich wrote:
> >> Using dynamically allocated / maintained vectors has several downsides:
> >> - possible nesting of IRQs due to the effects of IRQ migration,
> >> - reduction of vectors available for devices,
> >> - IRQs not moving as intended if there's shortage of vectors,
> >> - higher runtime overhead.
> >>
> >> As the vector also doesn't need to be of any priority (first and foremost
> >> it really shouldn't be of higher or same priority as the timer IRQ, as
> >> that raises TIMER_SOFTIRQ anyway), avoid any "ordinary" vectors altogther
> >> and use a vector from the 0x10...0x1f exception vector space. Exception vs
> >> interrupt can easily be distinguished by checking for the presence of an
> >> error code.
> >>
> >> With a fixed vector, less updating is now necessary in
> >> set_channel_irq_affinity(); in particular channels don't need transiently
> >> masking anymore, as the necessary update is now atomic. To fully leverage
> >> this, however, we want to stop using hpet_msi_set_affinity() there. With
> >> the transient masking dropped, we're no longer at risk of missing events.
> >>
> >> In principle a change to setup_vector_irq() would be necessary, but only
> >> if we used low-prio vectors as direct-APIC ones. Since the change would be
> >> at best benign here, it is being omitted.
> >>
> >> Fixes: 996576b965cc ("xen: allow up to 16383 cpus")
> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >> Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@xxxxxxxxx>
> >> ---
> >> This is an alternative proposal to
> >> https://lists.xen.org/archives/html/xen-devel/2014-03/msg00399.html.
> >>
> >> Should we keep hpet_msi_set_affinity() at all? We'd better not have the
> >> generic IRQ subsystem play with our IRQs' affinities ... (If so, this
> >> likely would want to be a separate patch, though.)
> >
> > I think that needs to become a no-op, with possibly an ASSERT? Is it
> > possibly for dom0 to try to balance this IRQ? I would think not.
>
> I'd consider it an error if that was possible. But then the same goes for
> other Xen-internal IRQs, like the IOMMU ones. They all implement a
> .set_affinity hook ...
We need such hook for fixup_irqs() at least, so that interrupts can be
evacuated when an AP goes offline. However movement of Xen owned IRQs
should be limited to Xen (if it's not already).
> >> @@ -476,19 +486,50 @@ static struct hpet_event_channel *hpet_g
> >> static void set_channel_irq_affinity(struct hpet_event_channel *ch)
> >> {
> >> struct irq_desc *desc = irq_to_desc(ch->msi.irq);
> >> + struct msi_msg msg = ch->msi.msg;
> >>
> >> ASSERT(!local_irq_is_enabled());
> >> spin_lock(&desc->lock);
> >> - hpet_msi_mask(desc);
> >> - hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
> >> - hpet_msi_unmask(desc);
> >> +
> >> + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
> >> +
> >> + /*
> >> + * Open-coding a reduced form of hpet_msi_set_affinity() here. With
> >> the
> >> + * actual update below (either of the IRTE or of [just] message
> >> address;
> >> + * with interrupt remapping message address/data don't change) now
> >> being
> >> + * atomic, we can avoid masking the IRQ around the update. As a
> >> result
> >> + * we're no longer at risk of missing IRQs (provided
> >> hpet_broadcast_enter()
> >> + * keeps setting the new deadline only afterwards).
> >> + */
> >> + cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu));
> >> +
> >> spin_unlock(&desc->lock);
> >>
> >> - spin_unlock(&ch->lock);
> >> + msg.dest32 = cpu_physical_id(ch->cpu);
> >> + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
> >> + msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
> >> + if ( msg.dest32 != ch->msi.msg.dest32 )
> >> + {
> >> + ch->msi.msg = msg;
> >> +
> >> + if ( iommu_intremap != iommu_intremap_off )
> >> + {
> >> + int rc = iommu_update_ire_from_msi(&ch->msi, &msg);
> >>
> >> - /* We may have missed an interrupt due to the temporary masking. */
> >> - if ( ch->event_handler && ch->next_event < NOW() )
> >> - ch->event_handler(ch);
> >> + ASSERT(rc <= 0);
> >> + if ( rc > 0 )
> >> + {
> >> + ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx)));
> >> + ASSERT(msg.address_lo ==
> >> + hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4));
> >> + }
> >
> > The sequence of asserts seem wrong here, the asserts inside of the rc
> > > 0 check will never trigger, because there's an ASSERT(rc <= 0)
> > ahead of them?
>
> Hmm. My way of thinking was that if we get back 1 (which we shouldn't),
> we ought to check (and presumably fail on) data or address having changed.
Right, but the ASSERT(rc <= 0) will prevent reaching any of the
followup ASSERTs if rc == 1? IOW, we possibly want:
if ( rc > 0 )
{
dprintk(XENLOG_ERR,
"Unexpected HPET MSI setup returned: data: %#x address:
%#lx expected data %#x address %#lx\n",
msg.data, msg.address,
ch->msi.msg.data, ch->msi.msg.address);
ASSERT_UNREACHABLE();
hpet_msi_mask(desc);
hpet_write32(msg.data, HPET_Tn_ROUTE(ch->idx));
hpet_write32(msg.address_lo, HPET_Tn_ROUTE(ch->idx) + 4);
hpet_msi_unmask(desc);
}
ASSERT(!rc);
I'm unsure about attempting to propagate the returned values on release
builds, I guess it's slightly better than possibly using an outdated
RTE entry? Albeit this should never happen.
Also, should the desc->arch.cpu_mask update only be done after the MSI
fields have correctly updated, so that in case of failure of
iommu_update_ire_from_msi(9 we could return early form the function
and avoid changing desc->arch.cpu_mask?
> Whereas when we get back 0, we're told "no change" anyway, and hence
> checking isn't even needed. Did I misunderstand the purpose of the zero
> vs positive return value here?
>
> Of could I could switch to using "rc >= 0" anyway; I actually had it that
> way first, but then decided the extra checks would be redundant in the 0
> case.
>
> >> + }
> >> + else
> >> + hpet_write32(msg.address_lo, HPET_Tn_ROUTE(ch->idx) + 4);
> >
> > If you avoid the HPET register update here you possibly need to make
> > sure that both fields are unconditionally written on the first call
> > after resume from suspension. hpet_resume() needs to somehow taint
> > the channels to signal that a re-write of the address and data fields
> > is mandatory regardless of what iommu_update_ire_from_msi() has
> > returned.
>
> hpet_broadcast_resume() calls __hpet_setup_msi_irq() (and hence
> hpet_msi_write()), which I thought is enough?
Oh, sorry, I was looking at hpet_resume(), not
hpet_broadcast_resume().
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |