|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ
On 24.10.2025 15:24, Roger Pau Monné wrote:
> On Thu, Oct 23, 2025 at 05:50:17PM +0200, Jan Beulich wrote:
>> Using dynamically allocated / maintained vectors has several downsides:
>> - possible nesting of IRQs due to the effects of IRQ migration,
>> - reduction of vectors available for devices,
>> - IRQs not moving as intended if there's shortage of vectors,
>> - higher runtime overhead.
>>
>> As the vector also doesn't need to be of any priority (first and foremost
>> it really shouldn't be of higher or same priority as the timer IRQ, as
>> that raises TIMER_SOFTIRQ anyway), simply use the lowest one above the
>> legacy range. The vector needs reserving early, until it is known whether
>> it actually is used. If it isn't, it's made available for general use.
>>
>> With a fixed vector, less updating is now necessary in
>> set_channel_irq_affinity(); in particular channels don't need transiently
>> masking anymore, as the necessary update is now atomic. To fully leverage
>> this, however, we want to stop using hpet_msi_set_affinity() there. With
>> the transient masking dropped, we're no longer at risk of missing events.
>>
>> Fixes: 996576b965cc ("xen: allow up to 16383 cpus")
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@xxxxxxxxx>
> ^ space?
Looks like I simply took what was provided; I've added the blank now (also
in patch 1).
> Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
Thanks.
>> @@ -303,15 +309,13 @@ static void cf_check hpet_msi_set_affini
>> struct hpet_event_channel *ch = desc->action->dev_id;
>> struct msi_msg msg = ch->msi.msg;
>>
>> - msg.dest32 = set_desc_affinity(desc, mask);
>> - if ( msg.dest32 == BAD_APICID )
>> - return;
>> + /* This really is only for dump_irqs(). */
>> + cpumask_copy(desc->arch.cpu_mask, mask);
>
> To add some extra checks here for correctness, do you think it would
> be helpful to add:
>
> ASSERT(cpumask_weight(mask) == 1);
> ASSERT(cpumask_intersects(mask, &cpu_online_mask));
>
> Or that's too pedantic?
Imo that would be pretty pointless in particular since the function is to go
away anyway.
>> @@ -343,6 +347,12 @@ static int __init hpet_setup_msi_irq(str
>> u32 cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
>> irq_desc_t *desc = irq_to_desc(ch->msi.irq);
>>
>> + clear_irq_vector(ch->msi.irq);
>> + ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR,
>> &cpu_online_map);
>
> By passing cpu_online_map here, it leads to _bind_irq_vector() doing:
>
> cpumask_copy(desc->arch.cpu_mask, &cpu_online_map);
>
> Which strictly speaking is wrong. However this is just a cosmetic
> issue until the irq is used for the first time, at which point it will
> be assigned to a concrete CPU.
>
> You could do:
>
> cpumask_clear(desc->arch.cpu_mask);
> cpumask_set_cpu(cpumask_any(&cpu_online_map), desc->arch.cpu_mask);
>
> (Or equivalent)
>
> To assign the interrupt to a concrete CPU and reflex it on the
> cpu_mask after the bind_irq_vector() call, but I can live with it
> being like this. I have patches to adjust _bind_irq_vector() myself,
> which I hope I will be able to post soon.
Hmm, I wrongly memorized hpet_broadcast_init() as being pre-SMP-init only.
It has three call sites:
- mwait_idle_init(), called from cpuidle_presmp_init(),
- amd_cpuidle_init(), calling in only when invoked the very first time,
which is again from cpuidle_presmp_init(),
- _disable_pit_irq(), called from the regular initcall disable_pit_irq().
I.e. for the latter you're right that the CPU mask is too broad (in only a
cosmetic way though). Would be you okay if I used cpumask_of(0) in place
of &cpu_online_map?
>> @@ -472,19 +482,50 @@ static struct hpet_event_channel *hpet_g
>> static void set_channel_irq_affinity(struct hpet_event_channel *ch)
>> {
>> struct irq_desc *desc = irq_to_desc(ch->msi.irq);
>> + struct msi_msg msg = ch->msi.msg;
>>
>> ASSERT(!local_irq_is_enabled());
>> spin_lock(&desc->lock);
>> - hpet_msi_mask(desc);
>> - hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
>> - hpet_msi_unmask(desc);
>> +
>> + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
>> +
>> + /*
>> + * Open-coding a reduced form of hpet_msi_set_affinity() here. With the
>> + * actual update below (either of the IRTE or of [just] message address;
>> + * with interrupt remapping message address/data don't change) now being
>> + * atomic, we can avoid masking the IRQ around the update. As a result
>> + * we're no longer at risk of missing IRQs (provided
>> hpet_broadcast_enter()
>> + * keeps setting the new deadline only afterwards).
>> + */
>> + cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu));
>> +
>> spin_unlock(&desc->lock);
>>
>> - spin_unlock(&ch->lock);
>> + msg.dest32 = cpu_physical_id(ch->cpu);
>> + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
>> + msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
>> + if ( msg.dest32 != ch->msi.msg.dest32 )
>> + {
>> + ch->msi.msg = msg;
>>
>> - /* We may have missed an interrupt due to the temporary masking. */
>> - if ( ch->event_handler && ch->next_event < NOW() )
>> - ch->event_handler(ch);
>> + if ( iommu_intremap != iommu_intremap_off )
>> + {
>> + int rc = iommu_update_ire_from_msi(&ch->msi, &msg);
>> +
>> + ASSERT(rc <= 0);
>> + if ( rc >= 0 )
>
> I don't think the rc > 0 part of this check is meaningful, as any rc
> value > 0 will trigger the ASSERT(rc <= 0) ahead of it. The code
> inside of the if block itself only contains ASSERTs, so it's only
> relevant for debug=y builds that will also have the rc <= 0 ASSERT.
>
> You could possibly use:
>
> ASSERT(rc <= 0);
> if ( !rc )
> {
> ASSERT(...
>
> And achieve the same result?
Yes, except that I'd like to keep the >= to cover the case if the first
assertion was dropped / commented out, as well as to have a doc effect.
>> @@ -991,6 +997,13 @@ void alloc_direct_apic_vector(uint8_t *v
>> spin_unlock(&lock);
>> }
>>
>> +/* This could free any vectors, but is needed only for low-prio ones. */
>> +void __init free_lopriority_vector(uint8_t vector)
>> +{
>> + ASSERT(vector < FIRST_HIPRIORITY_VECTOR);
>> + clear_bit(vector, used_vectors);
>> +}
>
> I'm undecided whether we want to have such helper. This is all very
> specific to the single use by the HPET vector, and hence might be best
> to simply put the clear_bit() inside of hpet_broadcast_late_init()
> itself.
I wanted to avoid making used_vectors non-static.
> I could see for example other callers wanting to use this also
> requiring cleanup of the per cpu vector_irq arrays. Given it's (so
> far) very limited usage it might be clearer to open-code the
> clear_bit().
Dealing with vector_irq[] is a separate thing, though, isn't it?
>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>> @@ -551,6 +551,13 @@ int cf_check amd_iommu_msi_msg_update_ir
>> for ( i = 1; i < nr; ++i )
>> msi_desc[i].remap_index = msi_desc->remap_index + i;
>> msg->data = data;
>> + /*
>> + * While the low address bits don't matter, "canonicalize" the
>> address
>> + * by zapping the bits that were transferred to the IRTE. This way
>> + * callers can check for there actually needing to be an update to
>> + * wherever the address is put.
>> + */
>> + msg->address_lo &= ~(MSI_ADDR_DESTMODE_MASK |
>> MSI_ADDR_DEST_ID_MASK);
>
> You might want to mention this change on the commit message also, as
> it could look unrelated to the rest of the code?
I thought the comment here provided enough context and detail. I've added
"AMD interrupt remapping code so far didn't "return" a consistent MSI
address when translating an MSI message. Clear respective fields there, to
keep the respective assertion in set_channel_irq_affinity() from
triggering."
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |