[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IOMMU faults after S3


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • Date: Thu, 2 Apr 2026 10:08:42 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=fm1 header.d=invisiblethingslab.com header.i="@invisiblethingslab.com" header.h="Cc:Content-Type:Date:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To"; dkim=pass header.s=fm2 header.d=messagingengine.com header.i="@messagingengine.com" header.h="Cc:Content-Type:Date:Feedback-ID:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To:X-ME-Proxy:X-ME-Sender"
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 02 Apr 2026 08:09:01 +0000
  • Feedback-id: i1568416f:Fastmail
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Apr 02, 2026 at 09:01:12AM +0200, Jan Beulich wrote:
> On 02.04.2026 01:17, Marek Marczykowski-Górecki wrote:
> > On Wed, Apr 01, 2026 at 10:52:37AM +0200, Jan Beulich wrote:
> >> On 01.04.2026 09:14, Jan Beulich wrote:
> >>> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
> >>>> I noticed that on some systems, there are a lot of IOMMU faults after
> >>>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> >>>> gitlab runner:
> >>>>
> >>>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
> >>>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device 
> >>>> [0000:00:1e.6] fault addr 0
> >>>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context 
> >>>> entry is clear
> >>>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device 
> >>>> [0000:00:1e.6] fault addr 0
> >>>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context 
> >>>> entry is clear
> >>>>
> >>>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
> >>>>
> >>>> The issue is present only on staging, not staging-4.21.
> >>>>
> >>>> Bisect says:
> >>>>
> >>>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
> >>>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
> >>>> Author: Jan Beulich <jbeulich@xxxxxxxx>
> >>>> Date:   Thu Jan 22 14:13:35 2026 +0100
> >>>>
> >>>>     x86/HPET: drop .set_affinity hook
> >>>
> >>> Looking into this, I find several things I can't quite understand (yet).
> >>> First there is
> >>>
> >>> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
> >>>
> >>> which looks to only affect clocksource selection, but not use as
> >>> broadcast source for CPU-idle management. (This may be an independent
> >>> issue.)
> >>>
> >>> Then there is
> >>>
> >>> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
> >>>
> >>> which should only occur on ARAT-incapable systems. That should only be
> >>> older hardware. (On my much older Skylake I don't see this line, for
> >>> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
> >>> is purely featureset based, and hence doesn't expose info about that
> >>> leaf. The leaf also isn't exposed to domains, so CPUID output in Dom0
> >>> isn't useful to look at either. It would need to be CPUID output on a
> >>> bare metal kernel.
> >>>
> >>> Further I suspect the fingered commit may only have uncovered an issue
> >>> elsewhere. I don't think we clear any context table entries during
> >>> suspend or resume. Hence in
> >>>
> >>> (XEN) [   20.554813] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] 
> >>> fault addr 0
> >>> (XEN) [   20.554819] [VT-D]DMAR: reason 02 - Present bit in context entry 
> >>> is clear
> >>>
> >>> the latter message is confusing me.
> >>>
> >>> The fault address being zero may, otoh, be a hint of hpet_msi_write()
> >>> never having run post-resume. Which may be the connection to the
> >>> dropping of hpet_msi_set_affinity(), as that did call that function.
> >>
> >> There clearly is an issue with the handling of the max_cstate variable,
> >> but I expect you don't use xenpm to limit usable C-states (there clearly
> >> is no respective command line option in the log you referenced)?
> > 
> > No, I don't think so.
> > 
> >> From what the log has, I conclude hpet_broadcast_resume() is called.
> > 
> > I don't think so... I applied changes as attached and got this on
> > resume:
> > 
> > (XEN) [   69.486120] Enabling non-boot CPUs  ...
> > (XEN) [   69.486404] mwait-idle: state C1 is disabled
> > (XEN) [   69.587869] mwait-idle: state C1 is disabled
> > (XEN) [   69.588008] mwait-idle: state C1 is disabled
> > (XEN) [   69.689438] mwait-idle: state C1 is disabled
> > (XEN) [   69.689608] mwait-idle: state C1 is disabled
> > (XEN) [   69.791066] mwait-idle: state C1 is disabled
> > (XEN) [   69.791334] mwait-idle: state C1 is disabled
> > (XEN) [   69.892938] mwait-idle: state C1 is disabled
> > (XEN) [   69.893209] mwait-idle: state C1 is disabled
> > (XEN) [   69.994890] mwait-idle: state C1 is disabled
> > (XEN) [   69.995096] mwait-idle: state C1 is disabled
> > (XEN) [   70.096638] mwait-idle: state C1 is disabled
> > (XEN) [   70.096915] mwait-idle: state C1 is disabled
> > (XEN) [   70.097093] mwait-idle: state C1 is disabled
> > (XEN) [   70.097272] mwait-idle: state C1 is disabled
> > (XEN) [   70.203357] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] 
> > fault addr 0
> > (XEN) [   70.203363] [VT-D]DMAR: reason 02 - Present bit in context entry 
> > is clear
> 
> That was on the serial console or from xl dmesg? I ask because 
> console_resume()
> runs after time_resume(), so nothing appearing on the serial console would be
> expected (I think).

Ah, right, that's why I don't see my messages.
The xl dmesg output (from MTL this time):

    (XEN) [  123.477511] Entering ACPI S3 state.
    (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, 
cpu_has_apic: 1
    (XEN) [18446743903.571856] _disable_pit_irq:2659: 
cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
    (XEN) [18446743903.571866] _disable_pit_irq:2662: init: 0
    (XEN) [18446743903.571877] hpet_broadcast_resume:661: hpet_events: 
ffff83046bc1f080
    (XEN) [18446743903.572020] hpet_broadcast_resume:672: num_hpets_used: 8
    (XEN) [18446743903.572029] hpet_broadcast_resume:690: cfg: 0x1
    (XEN) [18446743903.572040] hpet_broadcast_resume:695: i:0, 
hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
    (XEN) [18446743903.572081] hpet_broadcast_resume:706: i:0, cfg: 0xc134
    (XEN) [18446743903.572089] hpet_broadcast_resume:695: i:1, 
hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
    (XEN) [18446743903.572123] hpet_broadcast_resume:706: i:1, cfg: 0xc104
    (XEN) [18446743903.572132] hpet_broadcast_resume:695: i:2, 
hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
    (XEN) [18446743903.572167] hpet_broadcast_resume:706: i:2, cfg: 0xc104
    (XEN) [18446743903.572175] hpet_broadcast_resume:695: i:3, 
hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
    (XEN) [18446743903.572210] hpet_broadcast_resume:706: i:3, cfg: 0xc104
    (XEN) [18446743903.572218] hpet_broadcast_resume:695: i:4, 
hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
    (XEN) [18446743903.572252] hpet_broadcast_resume:706: i:4, cfg: 0xc104
    (XEN) [18446743903.572261] hpet_broadcast_resume:695: i:5, 
hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
    (XEN) [18446743903.572294] hpet_broadcast_resume:706: i:5, cfg: 0xc104
    (XEN) [18446743903.572303] hpet_broadcast_resume:695: i:6, 
hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
    (XEN) [18446743903.572338] hpet_broadcast_resume:706: i:6, cfg: 0xc104
    (XEN) [18446743903.572347] hpet_broadcast_resume:695: i:7, 
hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
    (XEN) [18446743903.572382] hpet_broadcast_resume:706: i:7, cfg: 0xc104

And the xen-cpuid -p output from this system:

    Xen reports there are maximum 120 leaves and 2 MSRs
    Raw policy: 48 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 00000023:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:20800800:77fafbff:bfebfbff
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000005:ffffffff -> 00000040:00000040:00000003:11112020
      00000006:ffffffff -> 00dfcff7:00000002:00000409:00040003
      00000007:00000000 -> 00000002:239c27eb:994007ac:fc18c410
      00000007:00000001 -> 40400910:00000001:00000000:00040000
      00000007:00000002 -> 00000000:00000000:00000000:0000003f
      0000000a:ffffffff -> 07300805:00000000:00000007:00008603
      0000000b:00000000 -> 00000001:00000002:00000100:00000020
      0000000b:00000001 -> 00000007:00000016:00000201:00000020
      0000000d:00000000 -> 00000207:00000000:00000a88:00000000
      0000000d:00000001 -> 0000000f:00000000:00019900:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      0000000d:00000008 -> 00000080:00000000:00000001:00000000
      0000000d:00000009 -> 00000008:00000a80:00000000:00000000
      0000000d:0000000b -> 00000010:00000000:00000001:00000000
      0000000d:0000000c -> 00000018:00000000:00000001:00000000
      0000000d:0000000f -> 00000328:00000000:00000001:00000000
      0000000d:00000010 -> 00000008:00000000:00000001:00000000
      80000000:ffffffff -> 80000008:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000121:2c100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000007:ffffffff -> 00000000:00000000:00000000:00000100
      80000008:ffffffff -> 0000302e:00000000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 000000000d89fd6b
    Host policy: 41 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:20800800:77fafbff:bfebfbff
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000005:ffffffff -> 00000040:00000040:00000003:11112020
      00000006:ffffffff -> 00dfcff7:00000002:00000409:00040003
      00000007:00000000 -> 00000002:239c27eb:994007ac:fc18c410
      00000007:00000001 -> 40000910:00000001:00000000:00040000
      00000007:00000002 -> 00000000:00000000:00000000:0000003f
      0000000b:00000000 -> 00000001:00000002:00000100:00000020
      0000000b:00000001 -> 00000007:00000016:00000201:00000020
      0000000d:00000000 -> 00000207:00000000:00000a88:00000000
      0000000d:00000001 -> 0000000f:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      0000000d:00000009 -> 00000008:00000a80:00000000:00000000
      80000000:ffffffff -> 80000008:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000121:2c100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000007:ffffffff -> 00000000:00000000:00000000:00000100
      80000008:ffffffff -> 0000302e:00000000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000000d89fd6b
    PV Max policy: 58 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:00800800:f6f83203:1fc9cbf5
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000007:00000000 -> 00000002:218c0329:18400700:ac004410
      00000007:00000001 -> 00000810:00000000:00000000:00000000
      00000007:00000002 -> 00000000:00000000:00000000:00000021
      0000000d:00000000 -> 00000007:00000000:00000340:00000000
      0000000d:00000001 -> 00000007:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      80000000:ffffffff -> 80000021:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000123:28100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000007:ffffffff -> 00000000:00000000:00000000:00000100
      80000008:ffffffff -> 0000302e:00001000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000001d0ae167
    HVM Max policy: 65 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:00800800:f7fa3223:1fcbfbff
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000007:00000000 -> 00000002:219c07ab:9840070c:bc004410
      00000007:00000001 -> 00000810:00000000:00000000:00000000
      00000007:00000002 -> 00000000:00000000:00000000:00000037
      0000000d:00000000 -> 00000207:00000000:00000a88:00000000
      0000000d:00000001 -> 0000000f:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      0000000d:00000009 -> 00000008:00000a80:00000000:00000000
      80000000:ffffffff -> 80000021:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000123:2c100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000007:ffffffff -> 00000000:00000000:00000000:00000100
      80000008:ffffffff -> 0000302e:00101000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000001d0ae167
    PV Default policy: 33 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:00800800:f6d83203:1fc9cbf5
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000007:00000000 -> 00000002:218c0329:00400700:ac004410
      00000007:00000001 -> 00000810:00000000:00000000:00000000
      00000007:00000002 -> 00000000:00000000:00000000:00000021
      0000000d:00000000 -> 00000007:00000000:00000340:00000000
      0000000d:00000001 -> 00000007:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      80000000:ffffffff -> 80000008:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000121:28100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000008:ffffffff -> 0000302e:00001000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000000d08e163
    HVM Default policy: 40 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:00800800:f7fa3203:1fcbfbff
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000007:00000000 -> 00000002:219c07ab:8040070c:bc004410
      00000007:00000001 -> 00000810:00000000:00000000:00000000
      00000007:00000002 -> 00000000:00000000:00000000:00000037
      0000000d:00000000 -> 00000207:00000000:00000a88:00000000
      0000000d:00000001 -> 0000000f:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      0000000d:00000009 -> 00000008:00000a80:00000000:00000000
      80000000:ffffffff -> 80000008:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000121:2c100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000008:ffffffff -> 0000302e:00101000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000000d08e163


> Without hpet_broadcast_resume() running, I don't think I could explain how the
> channels (and their FSB interrupts) would get enabled.
> 
> Jan

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.