|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ
On Mon, Oct 27, 2025 at 11:23:58AM +0100, Jan Beulich wrote:
> On 24.10.2025 15:24, Roger Pau Monné wrote:
> > On Thu, Oct 23, 2025 at 05:50:17PM +0200, Jan Beulich wrote:
> >> @@ -343,6 +347,12 @@ static int __init hpet_setup_msi_irq(str
> >> u32 cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
> >> irq_desc_t *desc = irq_to_desc(ch->msi.irq);
> >>
> >> + clear_irq_vector(ch->msi.irq);
> >> + ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR,
> >> &cpu_online_map);
> >
> > By passing cpu_online_map here, it leads to _bind_irq_vector() doing:
> >
> > cpumask_copy(desc->arch.cpu_mask, &cpu_online_map);
> >
> > Which strictly speaking is wrong. However this is just a cosmetic
> > issue until the irq is used for the first time, at which point it will
> > be assigned to a concrete CPU.
> >
> > You could do:
> >
> > cpumask_clear(desc->arch.cpu_mask);
> > cpumask_set_cpu(cpumask_any(&cpu_online_map), desc->arch.cpu_mask);
> >
> > (Or equivalent)
> >
> > To assign the interrupt to a concrete CPU and reflex it on the
> > cpu_mask after the bind_irq_vector() call, but I can live with it
> > being like this. I have patches to adjust _bind_irq_vector() myself,
> > which I hope I will be able to post soon.
>
> Hmm, I wrongly memorized hpet_broadcast_init() as being pre-SMP-init only.
> It has three call sites:
> - mwait_idle_init(), called from cpuidle_presmp_init(),
> - amd_cpuidle_init(), calling in only when invoked the very first time,
> which is again from cpuidle_presmp_init(),
> - _disable_pit_irq(), called from the regular initcall disable_pit_irq().
> I.e. for the latter you're right that the CPU mask is too broad (in only a
> cosmetic way though). Would be you okay if I used cpumask_of(0) in place
> of &cpu_online_map?
Using cpumask_of(0) would be OK, as the per-cpu vector_irq array will
be updated ahead of assigning the interrupt to a CPU, and hence it
doesn't need to be done for all possible online CPUs in
_bind_irq_vector().
In the context here it would be more accurate to provide an empty CPU
mask, as the interrupt is not yet targeting any CPU. Using CPU 0
would be a placeholder, which seems fine for the purpose.
> >> @@ -472,19 +482,50 @@ static struct hpet_event_channel *hpet_g
> >> static void set_channel_irq_affinity(struct hpet_event_channel *ch)
> >> {
> >> struct irq_desc *desc = irq_to_desc(ch->msi.irq);
> >> + struct msi_msg msg = ch->msi.msg;
> >>
> >> ASSERT(!local_irq_is_enabled());
> >> spin_lock(&desc->lock);
> >> - hpet_msi_mask(desc);
> >> - hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
> >> - hpet_msi_unmask(desc);
> >> +
> >> + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
> >> +
> >> + /*
> >> + * Open-coding a reduced form of hpet_msi_set_affinity() here. With
> >> the
> >> + * actual update below (either of the IRTE or of [just] message
> >> address;
> >> + * with interrupt remapping message address/data don't change) now
> >> being
> >> + * atomic, we can avoid masking the IRQ around the update. As a
> >> result
> >> + * we're no longer at risk of missing IRQs (provided
> >> hpet_broadcast_enter()
> >> + * keeps setting the new deadline only afterwards).
> >> + */
> >> + cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu));
> >> +
> >> spin_unlock(&desc->lock);
> >>
> >> - spin_unlock(&ch->lock);
> >> + msg.dest32 = cpu_physical_id(ch->cpu);
> >> + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
> >> + msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
> >> + if ( msg.dest32 != ch->msi.msg.dest32 )
> >> + {
> >> + ch->msi.msg = msg;
> >>
> >> - /* We may have missed an interrupt due to the temporary masking. */
> >> - if ( ch->event_handler && ch->next_event < NOW() )
> >> - ch->event_handler(ch);
> >> + if ( iommu_intremap != iommu_intremap_off )
> >> + {
> >> + int rc = iommu_update_ire_from_msi(&ch->msi, &msg);
> >> +
> >> + ASSERT(rc <= 0);
> >> + if ( rc >= 0 )
> >
> > I don't think the rc > 0 part of this check is meaningful, as any rc
> > value > 0 will trigger the ASSERT(rc <= 0) ahead of it. The code
> > inside of the if block itself only contains ASSERTs, so it's only
> > relevant for debug=y builds that will also have the rc <= 0 ASSERT.
> >
> > You could possibly use:
> >
> > ASSERT(rc <= 0);
> > if ( !rc )
> > {
> > ASSERT(...
> >
> > And achieve the same result?
>
> Yes, except that I'd like to keep the >= to cover the case if the first
> assertion was dropped / commented out, as well as to have a doc effect.
Oh, OK. Fair enough, I wasn't taking into account that this could be
done in case code is modified.
> >> @@ -991,6 +997,13 @@ void alloc_direct_apic_vector(uint8_t *v
> >> spin_unlock(&lock);
> >> }
> >>
> >> +/* This could free any vectors, but is needed only for low-prio ones. */
> >> +void __init free_lopriority_vector(uint8_t vector)
> >> +{
> >> + ASSERT(vector < FIRST_HIPRIORITY_VECTOR);
> >> + clear_bit(vector, used_vectors);
> >> +}
> >
> > I'm undecided whether we want to have such helper. This is all very
> > specific to the single use by the HPET vector, and hence might be best
> > to simply put the clear_bit() inside of hpet_broadcast_late_init()
> > itself.
>
> I wanted to avoid making used_vectors non-static.
>
> > I could see for example other callers wanting to use this also
> > requiring cleanup of the per cpu vector_irq arrays. Given it's (so
> > far) very limited usage it might be clearer to open-code the
> > clear_bit().
>
> Dealing with vector_irq[] is a separate thing, though, isn't it?
Possibly, that's part of the binding, rather than the allocation
itself (which is what you cover here).
> >> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> >> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> >> @@ -551,6 +551,13 @@ int cf_check amd_iommu_msi_msg_update_ir
> >> for ( i = 1; i < nr; ++i )
> >> msi_desc[i].remap_index = msi_desc->remap_index + i;
> >> msg->data = data;
> >> + /*
> >> + * While the low address bits don't matter, "canonicalize" the
> >> address
> >> + * by zapping the bits that were transferred to the IRTE. This
> >> way
> >> + * callers can check for there actually needing to be an update to
> >> + * wherever the address is put.
> >> + */
> >> + msg->address_lo &= ~(MSI_ADDR_DESTMODE_MASK |
> >> MSI_ADDR_DEST_ID_MASK);
> >
> > You might want to mention this change on the commit message also, as
> > it could look unrelated to the rest of the code?
>
> I thought the comment here provided enough context and detail. I've added
> "AMD interrupt remapping code so far didn't "return" a consistent MSI
> address when translating an MSI message. Clear respective fields there, to
> keep the respective assertion in set_channel_irq_affinity() from
> triggering."
LGTM, I would possibly remove the last "respective" for being
repetitive given the previous one in the sentence.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |