Xen project Mailing List

Re: [PATCH v2 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 23 Oct 2025 14:49:24 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jT07Q4pCbhQZlbFA7p7ZVwTicpcsFxE/NAEh+ocixTQ=; b=PY5hdaSmn0U9qgWylbgTjTE0+ZZLTTtqniVmCCgkZEwP/u3aucFO90Cu4i5/ylUA6xg0FIa1YZ6H/JCw8IB8d5NOIuz0GOztOU2Xq4/A5HRm5DXhvH26tcvNv5Usz6qTnV9uuldzl5A1CJp0Ao+8VbgaXaC6MPP72KCxDJqio1IhrfY9Q7fY7Ho8bdhNDif23Cwok8bnqxOcNaJRkvH8Zc6eO5rcc1Xex319wZu8Qy/WS5PHwrh+MZ7l73I+Ej9sMpSwackyoNWa2GZ8I8lySCBLm74iUOS6xcoNfMV536heFgDBLu600xYWtEacfexnIe9ROvh9X9Ej1HM6mxw90g==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=LoswD2zGgnSVUddnm6f2cux1Jop16PFu4J5YWdP+gPV/W1wH7AFhvwW2KPLHXD7JKKpvXAgjnBCNtC8dNLZI5mio4lEEdq6hy1Iac7qM6HqWfq+yPVPeFBUW4XwgjZ0lYuC1V9Cr+kIN0PAPMYdP9+/xfKyZ2f6ndKttETCMjM6lUOlRYjbc+NCLuoH2w5Nruf7heFZQ3moOzmIDKAFn6om/IYNMOMc5UE4hT6ifiW1qKFx8ehcTTwRaloHUflWNgNqjcuYT3yr5F+crkZ8nmS3E+/aTGcVPqnTwozrA6eS2glY8RfRfmwxmpr73UGvjRt+1VVcQZYZODdST1y2J9w==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Delivery-date: Thu, 23 Oct 2025 12:49:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Oct 23, 2025 at 12:37:22PM +0200, Jan Beulich wrote: > On 23.10.2025 10:39, Roger Pau Monné wrote: > > On Wed, Oct 22, 2025 at 11:21:15AM +0200, Jan Beulich wrote: > >> On 21.10.2025 15:49, Roger Pau Monné wrote: > >>> On Tue, Oct 21, 2025 at 08:42:13AM +0200, Jan Beulich wrote: > >>>> On 20.10.2025 18:22, Roger Pau Monné wrote: > >>>>> On Mon, Oct 20, 2025 at 01:18:34PM +0200, Jan Beulich wrote: > >>>>>> @@ -476,19 +486,50 @@ static struct hpet_event_channel *hpet_g > >>>>>> static void set_channel_irq_affinity(struct hpet_event_channel *ch) > >>>>>> { > >>>>>> struct irq_desc *desc = irq_to_desc(ch->msi.irq); > >>>>>> + struct msi_msg msg = ch->msi.msg; > >>>>>> > >>>>>> ASSERT(!local_irq_is_enabled()); > >>>>>> spin_lock(&desc->lock); > >>>>>> - hpet_msi_mask(desc); > >>>>>> - hpet_msi_set_affinity(desc, cpumask_of(ch->cpu)); > >>>>>> - hpet_msi_unmask(desc); > >>>>>> + > >>>>>> + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq; > >>>>>> + > >>>>>> + /* > >>>>>> + * Open-coding a reduced form of hpet_msi_set_affinity() here. > >>>>>> With the > >>>>>> + * actual update below (either of the IRTE or of [just] message > >>>>>> address; > >>>>>> + * with interrupt remapping message address/data don't change) > >>>>>> now being > >>>>>> + * atomic, we can avoid masking the IRQ around the update. As a > >>>>>> result > >>>>>> + * we're no longer at risk of missing IRQs (provided > >>>>>> hpet_broadcast_enter() > >>>>>> + * keeps setting the new deadline only afterwards). > >>>>>> + */ > >>>>>> + cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu)); > >>>>>> + > >>>>>> spin_unlock(&desc->lock); > >>>>>> > >>>>>> - spin_unlock(&ch->lock); > >>>>>> + msg.dest32 = cpu_physical_id(ch->cpu); > >>>>>> + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; > >>>>>> + msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32); > >>>>>> + if ( msg.dest32 != ch->msi.msg.dest32 ) > >>>>>> + { > >>>>>> + ch->msi.msg = msg; > >>>>>> + > >>>>>> + if ( iommu_intremap != iommu_intremap_off ) > >>>>>> + { > >>>>>> + int rc = iommu_update_ire_from_msi(&ch->msi, &msg); > >>>>>> > >>>>>> - /* We may have missed an interrupt due to the temporary masking. > >>>>>> */ > >>>>>> - if ( ch->event_handler && ch->next_event < NOW() ) > >>>>>> - ch->event_handler(ch); > >>>>>> + ASSERT(rc <= 0); > >>>>>> + if ( rc > 0 ) > >>>>>> + { > >>>>>> + ASSERT(msg.data == > >>>>>> hpet_read32(HPET_Tn_ROUTE(ch->idx))); > >>>>>> + ASSERT(msg.address_lo == > >>>>>> + hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4)); > >>>>>> + } > >>>>> > >>>>> The sequence of asserts seem wrong here, the asserts inside of the rc > >>>>>> 0 check will never trigger, because there's an ASSERT(rc <= 0) > >>>>> ahead of them? > >>>> > >>>> Hmm. My way of thinking was that if we get back 1 (which we shouldn't), > >>>> we ought to check (and presumably fail on) data or address having > >>>> changed. > >>> > >>> Right, but the ASSERT(rc <= 0) will prevent reaching any of the > >>> followup ASSERTs if rc == 1? > >> > >> Which is no problem, as we'd be dead already anyway if the first assertion > >> triggered. Nevertheless I've switched the if() to >= 0 (which then pointed > >> out a necessary change in AMD IOMMU code). > > > > Right, so and adjusted if condition plus an ASSERT_UNREACHABLE() at > > the end of the if code block? > > That is, instead of > > ASSERT(rc <= 0); > if ( rc >= 0 ) > { > ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx))); > ASSERT(msg.address_lo == > hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4)); > } > > you'd prefer > > if ( rc >= 0 ) > { > ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx))); > ASSERT(msg.address_lo == > hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4)); > ASSERT_UNREACHABLE(); > } > > ? That's wrong though (for rc == 0), i.e. I fear I don't see what you mean. Oh, I see, sorry for the suggestions, it's indeed wrong. FTAOD, what do you plan to use then here? You could replace the ASSERT_UNREACHABLE() for ASSERT(rc == 0) in my suggestion I think? Or maybe just do: ASSERT(rc <= 0); if ( !rc ) { ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx))); ASSERT(msg.address_lo == hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4)); } Was your original intention with those checks to ensure that for the rc == 0 case the message fields remained unchanged? > >>> IOW, we possibly want: > >>> > >>> if ( rc > 0 ) > >>> { > >>> dprintk(XENLOG_ERR, > >>> "Unexpected HPET MSI setup returned: data: %#x > >>> address: %#lx expected data %#x address %#lx\n", > >>> msg.data, msg.address, > >>> ch->msi.msg.data, ch->msi.msg.address); > >>> ASSERT_UNREACHABLE(); > >>> hpet_msi_mask(desc); > >>> hpet_write32(msg.data, HPET_Tn_ROUTE(ch->idx)); > >>> hpet_write32(msg.address_lo, HPET_Tn_ROUTE(ch->idx) + 4); > >>> hpet_msi_unmask(desc); > >>> } > >>> ASSERT(!rc); > >> > >> To be honest, for my taste this goes too far as to what follows an > >> ASSERT_UNREACHABLE(). > > > > I can understand that. It's the best way I've come up with attempting > > to recover from a possible error in the release case, but I don't > > particularly like it either. > > > >>> I'm unsure about attempting to propagate the returned values on release > >>> builds, I guess it's slightly better than possibly using an outdated > >>> RTE entry? Albeit this should never happen. > >> > >> Yes to the last remark; I don't actually see what you would want to do > >> with the propagated return value. > > > > OK, I can this this not being clear. By propagate here I mean > > propagate to the hardware registers, not to the function caller. > > I.e. you still think adding the two hpet_write32() is going to be useful? > The mask/unmask, as I did say in another reply to your comments, isn't > useful here anyway (for already not being atomic), so I wouldn't see much > sense in having them. Right, for it to be correct the masking would need to include the iommu_update_ire_from_msi() call also. > Plus of course we'd want to avoid the writes on > release builds if the values actually match, i.e. the construct would then > rather end up as two if-mismatch-then-write-else-assert-unreachable ones. My concern would be that after this change we won't cope anymore with iommu_update_ire_from_msi() returning a value > 1. Which might be fine, as it's in theory not possible, but seems less robust than it was before the change. I guess it's the price to pay for avoiding the masking (unless you have other options). Looking at the existing code is likely no worse than when iommu_update_ire_from_msi() returning an error, and that case is already silently ignored by hpet_msi_set_affinity(). So I think silently ignoring > 0 is not that different, and doesn't make the current handling much worse. It would be nice to handle those better, but can be done separately. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.