[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IOMMU faults after S3


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 2 Apr 2026 08:55:01 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 02 Apr 2026 06:55:11 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01.04.2026 22:30, Marek Marczykowski-Górecki wrote:
> On Wed, Apr 01, 2026 at 10:11:12AM +0200, Jan Beulich wrote:
>> On 01.04.2026 09:20, Andrew Cooper wrote:
>>> On 01/04/2026 9:14 am, Jan Beulich wrote:
>>>> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
>>>>> I noticed that on some systems, there are a lot of IOMMU faults after
>>>>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
>>>>> gitlab runner:
>>>>>
>>>>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>>>>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device 
>>>>> [0000:00:1e.6] fault addr 0
>>>>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context 
>>>>> entry is clear
>>>>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device 
>>>>> [0000:00:1e.6] fault addr 0
>>>>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context 
>>>>> entry is clear
>>>>>
>>>>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
>>>>>
>>>>> The issue is present only on staging, not staging-4.21.
>>>>>
>>>>> Bisect says:
>>>>>
>>>>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
>>>>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
>>>>> Author: Jan Beulich <jbeulich@xxxxxxxx>
>>>>> Date:   Thu Jan 22 14:13:35 2026 +0100
>>>>>
>>>>>     x86/HPET: drop .set_affinity hook
>>>> Looking into this, I find several things I can't quite understand (yet).
>>>> First there is
>>>>
>>>> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
>>>>
>>>> which looks to only affect clocksource selection, but not use as
>>>> broadcast source for CPU-idle management. (This may be an independent
>>>> issue.)
>>>>
>>>> Then there is
>>>>
>>>> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
>>>>
>>>> which should only occur on ARAT-incapable systems. That should only be
>>>> older hardware.
>>>
>>> I'm not sure that's a reasonable assertion to draw.  The number of HPET
>>> channels is down to the HPET alone, not anything to do with the CPU
>>> capabilities.
>>
>> My statement was about the mere presence of that message, not the number
>> of channels that are reported.
>>
>>>>  (On my much older Skylake I don't see this line, for
>>>> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
>>>> is purely featureset based, and hence doesn't expose info about that
>>>> leaf.
>>>
>>> xen-cpuid -p
>>>
>>> That will get you leaf 6, but there's no human-readable decode of it.
>>
>> Raw numbers is good enough here. How did I miss that option when looking
>> at --help output? Oh, simply because it isn't shown there.
>>
>> Marek, that'll be better than bare metal kernel data, as it gives us both
>> raw and host policies.
> 
> Here is the output from ADL runner:
> 
> Xen reports there are maximum 120 leaves and 2 MSRs
> Raw policy: 48 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx     
>   00000000:ffffffff -> 00000020:756e6547:6c65746e:49656e69
>   00000001:ffffffff -> 00090672:00800800:77fafbff:bfebfbff
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>   00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
>   00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
>   00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
>   00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
>   00000005:ffffffff -> 00000040:00000040:00000003:10102020
>   00000006:ffffffff -> 00df8ff7:00000002:00000409:00000003
>   00000007:00000000 -> 00000002:239c27eb:98c027ac:fc1cc410
>   00000007:00000001 -> 00400810:00000000:00000000:00040000
>   00000007:00000002 -> 00000000:00000000:00000000:00000017
>   0000000a:ffffffff -> 07300605:00000000:00000007:00008603
>   0000000b:00000000 -> 00000001:00000002:00000100:00000000
>   0000000b:00000001 -> 00000007:00000010:00000201:00000000
>   0000000d:00000000 -> 00000207:00000000:00000a88:00000000
>   0000000d:00000001 -> 0000000f:00000000:00019900:00000000
>   0000000d:00000002 -> 00000100:00000240:00000000:00000000
>   0000000d:00000008 -> 00000080:00000000:00000001:00000000
>   0000000d:00000009 -> 00000008:00000a80:00000000:00000000
>   0000000d:0000000b -> 00000010:00000000:00000001:00000000
>   0000000d:0000000c -> 00000018:00000000:00000001:00000000
>   0000000d:0000000f -> 00000328:00000000:00000001:00000000
>   0000000d:00000010 -> 00000008:00000000:00000001:00000000
>   80000000:ffffffff -> 80000008:00000000:00000000:00000000
>   80000001:ffffffff -> 00000000:00000000:00000121:2c100800
>   80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65
>   80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35
>   80000004:ffffffff -> 4b303036:00000000:00000000:00000000
>   80000006:ffffffff -> 00000000:00000000:05007040:00000000
>   80000007:ffffffff -> 00000000:00000000:00000000:00000100
>   80000008:ffffffff -> 0000302e:00000000:00000000:00000000
>  MSRs:
>   index    -> value           
>   000000ce -> 0000000080000000
>   0000010a -> 000000001488fd6b
> Host policy: 41 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx     
>   00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
>   00000001:ffffffff -> 00090672:00800800:77fafbff:bfebfbff
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>   00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
>   00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
>   00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
>   00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
>   00000005:ffffffff -> 00000040:00000040:00000003:10102020
>   00000006:ffffffff -> 00df8ff7:00000002:00000409:00000003

And everything as expected: The ARAT bit is set.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.