|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: IOMMU faults after S3
On 01.04.2026 22:30, Marek Marczykowski-Górecki wrote: > On Wed, Apr 01, 2026 at 10:11:12AM +0200, Jan Beulich wrote: >> On 01.04.2026 09:20, Andrew Cooper wrote: >>> On 01/04/2026 9:14 am, Jan Beulich wrote: >>>> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote: >>>>> I noticed that on some systems, there are a lot of IOMMU faults after >>>>> S3. I can see it also on a laptop with MTL, but it affects also the ADL >>>>> gitlab runner: >>>>> >>>>> https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722 >>>>> (XEN) [ 37.201160] [VT-D]DMAR:[DMA Write] Request device >>>>> [0000:00:1e.6] fault addr 0 >>>>> (XEN) [ 37.201164] [VT-D]DMAR: reason 02 - Present bit in context >>>>> entry is clear >>>>> (XEN) [ 37.202332] [VT-D]DMAR:[DMA Write] Request device >>>>> [0000:00:1e.6] fault addr 0 >>>>> (XEN) [ 37.202339] [VT-D]DMAR: reason 02 - Present bit in context >>>>> entry is clear >>>>> >>>>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci. >>>>> >>>>> The issue is present only on staging, not staging-4.21. >>>>> >>>>> Bisect says: >>>>> >>>>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit >>>>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b >>>>> Author: Jan Beulich <jbeulich@xxxxxxxx> >>>>> Date: Thu Jan 22 14:13:35 2026 +0100 >>>>> >>>>> x86/HPET: drop .set_affinity hook >>>> Looking into this, I find several things I can't quite understand (yet). >>>> First there is >>>> >>>> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable >>>> >>>> which looks to only affect clocksource selection, but not use as >>>> broadcast source for CPU-idle management. (This may be an independent >>>> issue.) >>>> >>>> Then there is >>>> >>>> (XEN) [ 2.760248] HPET: 8 timers usable for broadcast (8 total) >>>> >>>> which should only occur on ARAT-incapable systems. That should only be >>>> older hardware. >>> >>> I'm not sure that's a reasonable assertion to draw. The number of HPET >>> channels is down to the HPET alone, not anything to do with the CPU >>> capabilities. >> >> My statement was about the mere presence of that message, not the number >> of channels that are reported. >> >>>> (On my much older Skylake I don't see this line, for >>>> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid >>>> is purely featureset based, and hence doesn't expose info about that >>>> leaf. >>> >>> xen-cpuid -p >>> >>> That will get you leaf 6, but there's no human-readable decode of it. >> >> Raw numbers is good enough here. How did I miss that option when looking >> at --help output? Oh, simply because it isn't shown there. >> >> Marek, that'll be better than bare metal kernel data, as it gives us both >> raw and host policies. > > Here is the output from ADL runner: > > Xen reports there are maximum 120 leaves and 2 MSRs > Raw policy: 48 leaves, 2 MSRs > CPUID: > leaf subleaf -> eax ebx ecx edx > 00000000:ffffffff -> 00000020:756e6547:6c65746e:49656e69 > 00000001:ffffffff -> 00090672:00800800:77fafbff:bfebfbff > 00000002:ffffffff -> 00feff01:000000f0:00000000:00000000 > 00000004:00000000 -> fc004121:02c0003f:0000003f:00000000 > 00000004:00000001 -> fc004122:01c0003f:0000003f:00000000 > 00000004:00000002 -> fc01c143:0240003f:000007ff:00000000 > 00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004 > 00000005:ffffffff -> 00000040:00000040:00000003:10102020 > 00000006:ffffffff -> 00df8ff7:00000002:00000409:00000003 > 00000007:00000000 -> 00000002:239c27eb:98c027ac:fc1cc410 > 00000007:00000001 -> 00400810:00000000:00000000:00040000 > 00000007:00000002 -> 00000000:00000000:00000000:00000017 > 0000000a:ffffffff -> 07300605:00000000:00000007:00008603 > 0000000b:00000000 -> 00000001:00000002:00000100:00000000 > 0000000b:00000001 -> 00000007:00000010:00000201:00000000 > 0000000d:00000000 -> 00000207:00000000:00000a88:00000000 > 0000000d:00000001 -> 0000000f:00000000:00019900:00000000 > 0000000d:00000002 -> 00000100:00000240:00000000:00000000 > 0000000d:00000008 -> 00000080:00000000:00000001:00000000 > 0000000d:00000009 -> 00000008:00000a80:00000000:00000000 > 0000000d:0000000b -> 00000010:00000000:00000001:00000000 > 0000000d:0000000c -> 00000018:00000000:00000001:00000000 > 0000000d:0000000f -> 00000328:00000000:00000001:00000000 > 0000000d:00000010 -> 00000008:00000000:00000001:00000000 > 80000000:ffffffff -> 80000008:00000000:00000000:00000000 > 80000001:ffffffff -> 00000000:00000000:00000121:2c100800 > 80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65 > 80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35 > 80000004:ffffffff -> 4b303036:00000000:00000000:00000000 > 80000006:ffffffff -> 00000000:00000000:05007040:00000000 > 80000007:ffffffff -> 00000000:00000000:00000000:00000100 > 80000008:ffffffff -> 0000302e:00000000:00000000:00000000 > MSRs: > index -> value > 000000ce -> 0000000080000000 > 0000010a -> 000000001488fd6b > Host policy: 41 leaves, 2 MSRs > CPUID: > leaf subleaf -> eax ebx ecx edx > 00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69 > 00000001:ffffffff -> 00090672:00800800:77fafbff:bfebfbff > 00000002:ffffffff -> 00feff01:000000f0:00000000:00000000 > 00000004:00000000 -> fc004121:02c0003f:0000003f:00000000 > 00000004:00000001 -> fc004122:01c0003f:0000003f:00000000 > 00000004:00000002 -> fc01c143:0240003f:000007ff:00000000 > 00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004 > 00000005:ffffffff -> 00000040:00000040:00000003:10102020 > 00000006:ffffffff -> 00df8ff7:00000002:00000409:00000003 And everything as expected: The ARAT bit is set. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |