[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

IOMMU faults after S3


  • To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • Date: Fri, 27 Mar 2026 11:19:04 +0100
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=fm1 header.d=invisiblethingslab.com header.i="@invisiblethingslab.com" header.h="Cc:Content-Type:Date:From:Message-ID:MIME-Version:Subject:To"; dkim=pass header.s=fm1 header.d=messagingengine.com header.i="@messagingengine.com" header.h="Cc:Content-Type:Date:Feedback-ID:From:Message-ID:MIME-Version:Subject:To:X-ME-Proxy:X-ME-Sender"
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Fri, 27 Mar 2026 10:19:13 +0000
  • Feedback-id: i1568416f:Fastmail
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi,

I noticed that on some systems, there are a lot of IOMMU faults after
S3. I can see it also on a laptop with MTL, but it affects also the ADL
gitlab runner:

    https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
    (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] 
fault addr 0
    (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry 
is clear
    (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] 
fault addr 0
    (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry 
is clear

Interestingly, the 0000:00:1e.6 device is not even listed by lspci.

The issue is present only on staging, not staging-4.21.

Bisect says:

5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
Author: Jan Beulich <jbeulich@xxxxxxxx>
Date:   Thu Jan 22 14:13:35 2026 +0100

    x86/HPET: drop .set_affinity hook
    
    No IRQ balancing is supposed to be happening on the broadcast IRQs. The
    only entity responsible for fiddling with the CPU affinities is
    set_channel_irq_affinity(). They shouldn't even be fiddled with when
    offlining a CPU: A CPU going down can't at the same time be idle. Some
    properties (->arch.cpu_mask in particular) may transiently reference an
    offline CPU, but that'll be adjusted as soon as a channel goes into active
    use again.
    
    Along with adjusting fixup_irqs() (in a more general way, i.e. covering all
    vectors which are marked in use globally), also adjust section placement of
    used_vectors.
    
    Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
    Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

 xen/arch/x86/hpet.c | 17 -----------------
 xen/arch/x86/irq.c  | 12 ++++++++----
 2 files changed, 8 insertions(+), 21 deletions(-)


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.