[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IOMMU faults after S3


  • To: "Marek Marczykowski-Górecki" <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Teddy Astie" <teddy.astie@xxxxxxxxxx>
  • Date: Fri, 27 Mar 2026 10:56:43 +0000
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=mte1 header.d=mandrillapp.com header.i="@mandrillapp.com" header.h="From:Subject:Message-Id:To:Cc:References:In-Reply-To:Feedback-ID:Date:MIME-Version:Content-Type:Content-Transfer-Encoding"; dkim=pass header.s=mte1 header.d=vates.tech header.i="teddy.astie@xxxxxxxxxx" header.h="From:Subject:Message-Id:To:Cc:References:In-Reply-To:Feedback-ID:Date:MIME-Version:Content-Type:Content-Transfer-Encoding"
  • Cc: "Jan Beulich" <jbeulich@xxxxxxxx>
  • Delivery-date: Fri, 27 Mar 2026 10:56:53 +0000
  • Feedback-id: 30504962:30504962.20260327:md
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Le 27/03/2026 à 11:19, Marek Marczykowski-Górecki a écrit :
> Hi,
>
> I noticed that on some systems, there are a lot of IOMMU faults after
> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> gitlab runner:
>
>      https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>      (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device 
> [0000:00:1e.6] fault addr 0
>      (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context 
> entry is clear
>      (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device 
> [0000:00:1e.6] fault addr 0
>      (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context 
> entry is clear
>
> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
>
> The issue is present only on staging, not staging-4.21.
>

Is there a 1e.0 device ? That could be a "phantom" PCI device.

> Bisect says:
>
> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
> Author: Jan Beulich <jbeulich@xxxxxxxx>
> Date:   Thu Jan 22 14:13:35 2026 +0100
>
>      x86/HPET: drop .set_affinity hook
>
>      No IRQ balancing is supposed to be happening on the broadcast IRQs. The
>      only entity responsible for fiddling with the CPU affinities is
>      set_channel_irq_affinity(). They shouldn't even be fiddled with when
>      offlining a CPU: A CPU going down can't at the same time be idle. Some
>      properties (->arch.cpu_mask in particular) may transiently reference an
>      offline CPU, but that'll be adjusted as soon as a channel goes into 
> active
>      use again.
>
>      Along with adjusting fixup_irqs() (in a more general way, i.e. covering 
> all
>      vectors which are marked in use globally), also adjust section placement 
> of
>      used_vectors.
>
>      Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>      Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>
>   xen/arch/x86/hpet.c | 17 -----------------
>   xen/arch/x86/irq.c  | 12 ++++++++----
>   2 files changed, 8 insertions(+), 21 deletions(-)
>
>



--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.