[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IOMMU faults after S3


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • Date: Tue, 7 Apr 2026 13:34:43 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=fm2 header.d=invisiblethingslab.com header.i="@invisiblethingslab.com" header.h="Cc:Content-Type:Date:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To"; dkim=pass header.s=fm2 header.d=messagingengine.com header.i="@messagingengine.com" header.h="Cc:Content-Type:Date:Feedback-ID:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To:X-ME-Proxy:X-ME-Sender"
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 07 Apr 2026 11:34:54 +0000
  • Feedback-id: i1568416f:Fastmail
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Apr 07, 2026 at 12:23:16PM +0200, Jan Beulich wrote:
> On 07.04.2026 08:29, Jan Beulich wrote:
> > On 03.04.2026 01:06, Marek Marczykowski-Górecki wrote:
> >> On Thu, Apr 02, 2026 at 04:53:31PM +0200, Jan Beulich wrote:
> >>> Sadly you now log the low halves of HPET_Tn_ROUTE twice, while you don't 
> >>> log
> >>> the high halves at all.
> >>
> >> I was missing hpet_read32 there...
> >>
> >> Updated:
> >> (XEN) [  116.921573] Entering ACPI S3 state.
> >> (XEN) [18446743895.088893] _disable_pit_irq:2649: using_pit: 0, 
> >> cpu_has_apic: 1
> >> (XEN) [18446743895.088907] _disable_pit_irq:2659: 
> >> cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >> (XEN) [18446743895.088918] _disable_pit_irq:2662: init: 0
> >> (XEN) [18446743895.088928] hpet_broadcast_resume:662: hpet_events: 
> >> ffff83046bc1f080
> >> (XEN) [18446743895.089072] hpet_broadcast_resume:673: num_hpets_used: 8
> >> (XEN) [18446743895.089081] hpet_broadcast_resume:691: cfg: 0x1
> >> (XEN) [18446743895.089092] hpet_broadcast_resume:696: i:0, 
> >> hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
> >> (XEN) [18446743895.089122] hpet_msi_write:286: iommu_update_ire_from_msi 
> >> rc: 0
> >> (XEN) [18446743895.089132] hpet_broadcast_resume:700: i:0, 
> >> __hpet_setup_msi_irq ret: 0
> >> (XEN) [18446743895.089168] hpet_broadcast_resume:710: i:0, cfg: 0xc134, 
> >> hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
> >> hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf18
> > 
> > Okay, this would appear to clarify that the address really isn't correct. 
> > Yet I'm
> > confused now by the low half values: In your earlier log there was
> > 
> > hpet_broadcast_resume:710: i:0, cfg: 0xc134, 
> > HPET_Tn_ROUTE(hpet_events[i].idx): 0x110
> > 
> > and alike, i.e. clearly a non-zero value. Now all low halves are zero. I'll 
> > try
> > to figure how the logged values here could result, but consistent data (or 
> > an
> > explantation for the apparent inconsistency) would help.
> 
> Could you give the patch below a try?
> 
> Jan
> 
> x86/HPET: channel handling in hpet_broadcast_resume()
> 
> The per-channel ENABLE bit is to solely be driven by hpet_enable_channel()
> and hpet_msi_{,un}mask(). It doesn't need setting immediately. Except for
> the (possible) channel put in legacy mode we don't do so during boot
> either.
> 
> Instead reset ->arch.cpu_mask, to avoid msi_compose_msg() yielding an
> all-zero message (when the passed in CPU mask has no online CPUs). Nothing
> would later call msi_compose_msg() / hpet_msi_write(), and hence nothing
> would later produce a well-formed message template in
> hpet_events[].msi.msg.
> 
> Fixes: 15aa6c67486c ("amd iommu: use base platform MSI implementation")
> Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

This appears to fix the IOMMU faults.
Started with no-arat, the debug output is now this:

(XEN) [18446743900.509455] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
(XEN) [18446743900.509470] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, 
boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
(XEN) [18446743900.509480] _disable_pit_irq:2662: init: 0
(XEN) [18446743900.509491] hpet_broadcast_resume:662: hpet_events: 
ffff830461b3f080
(XEN) [18446743900.509636] hpet_broadcast_resume:673: num_hpets_used: 8
(XEN) [18446743900.509644] hpet_broadcast_resume:691: cfg: 0x1
(XEN) [18446743900.509656] hpet_broadcast_resume:696: i:0, 
hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
(XEN) [18446743900.509687] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509698] hpet_broadcast_resume:705: i:0, __hpet_setup_msi_irq 
ret: 0
(XEN) [18446743900.509728] hpet_broadcast_resume:715: i:0, cfg: 0xc130, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f18
(XEN) [18446743900.509739] hpet_broadcast_resume:696: i:1, 
hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
(XEN) [18446743900.509762] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509772] hpet_broadcast_resume:705: i:1, __hpet_setup_msi_irq 
ret: 0
(XEN) [18446743900.509803] hpet_broadcast_resume:715: i:1, cfg: 0xc100, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f38
(XEN) [18446743900.509814] hpet_broadcast_resume:696: i:2, 
hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
(XEN) [18446743900.509838] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509848] hpet_broadcast_resume:705: i:2, __hpet_setup_msi_irq 
ret: 0
(XEN) [18446743900.509877] hpet_broadcast_resume:715: i:2, cfg: 0xc100, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f58
(XEN) [18446743900.509888] hpet_broadcast_resume:696: i:3, 
hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
(XEN) [18446743900.509912] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509922] hpet_broadcast_resume:705: i:3, __hpet_setup_msi_irq 
ret: 0
(XEN) [18446743900.509952] hpet_broadcast_resume:715: i:3, cfg: 0xc100, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f78
(XEN) [18446743900.509963] hpet_broadcast_resume:696: i:4, 
hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
(XEN) [18446743900.509987] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509997] hpet_broadcast_resume:705: i:4, __hpet_setup_msi_irq 
ret: 0
(XEN) [18446743900.510027] hpet_broadcast_resume:715: i:4, cfg: 0xc100, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f98
(XEN) [18446743900.510038] hpet_broadcast_resume:696: i:5, 
hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
(XEN) [18446743900.510062] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.510072] hpet_broadcast_resume:705: i:5, __hpet_setup_msi_irq 
ret: 0
(XEN) [18446743900.510102] hpet_broadcast_resume:715: i:5, cfg: 0xc100, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00fb8
(XEN) [18446743900.510113] hpet_broadcast_resume:696: i:6, 
hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
(XEN) [18446743900.510138] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.510149] hpet_broadcast_resume:705: i:6, __hpet_setup_msi_irq 
ret: 0
(XEN) [18446743900.510179] hpet_broadcast_resume:715: i:6, cfg: 0xc100, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00fd8
(XEN) [18446743900.510191] hpet_broadcast_resume:696: i:7, 
hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
(XEN) [18446743900.510214] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.510224] hpet_broadcast_resume:705: i:7, __hpet_setup_msi_irq 
ret: 0
(XEN) [18446743900.510253] hpet_broadcast_resume:715: i:7, cfg: 0xc100, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, 
hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00ff8


> ---
> As to the Fixes: tag: The issue for the HPET resume case is the
> cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) check in
> msi_compose_msg(). The earlier cpumask_empty() wasn't a problem, as
> cpu_mask_to_apicid() returning a bogus (offline) value didn't have any bad
> effect: Before use, a valid destination would have been put in place, but
> other parts of .msg were properly set up. Furthermore we also didn't clear
> the entire message prior to that change.
> 
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -685,12 +685,18 @@ void hpet_broadcast_resume(void)
>      for ( i = 0; i < n; i++ )
>      {
>          if ( hpet_events[i].msi.irq >= 0 )
> +        {
> +            struct irq_desc *desc = irq_to_desc(hpet_events[i].msi.irq);
> +
> +            cpumask_copy(desc->arch.cpu_mask, 
> cpumask_of(smp_processor_id()));
> +
>              __hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));
> +        }
>  
>          /* set HPET Tn as oneshot */
>          cfg = hpet_read32(HPET_Tn_CFG(hpet_events[i].idx));
>          cfg &= ~(HPET_TN_LEVEL | HPET_TN_PERIODIC);
> -        cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
> +        cfg |= HPET_TN_32BIT;
>          if ( !(hpet_events[i].flags & HPET_EVT_LEGACY) )
>              cfg |= HPET_TN_FSB;
>          hpet_write32(cfg, HPET_Tn_CFG(hpet_events[i].idx));
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.