[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Graphical glitches (not refreshing?) with Linux's xe driver + Xen 4.19


  • To: Matthew Brost <matthew.brost@xxxxxxxxx>
  • From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • Date: Mon, 22 Jun 2026 14:13:27 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=fm1 header.d=invisiblethingslab.com header.i="@invisiblethingslab.com" header.h="Cc:Content-Type:Date:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To"; dkim=pass header.s=fm1 header.d=messagingengine.com header.i="@messagingengine.com" header.h="Cc:Content-Type:Date:Feedback-ID:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To:X-ME-Proxy:X-ME-Sender"
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, intel-xe@xxxxxxxxxxxxxxxxxxxxx, jani.nikula@xxxxxxxxx
  • Delivery-date: Mon, 22 Jun 2026 12:13:43 +0000
  • Feedback-id: i1568416f:Fastmail
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 17, 2026 at 04:25:14PM -0700, Matthew Brost wrote:
> On Wed, Jun 17, 2026 at 10:30:08PM +0200, Marek Marczykowski-Górecki wrote:
> > On Mon, Mar 02, 2026 at 12:19:04PM +0100, Marek Marczykowski-Górecki wrote:
> > > On Tue, Feb 24, 2026 at 04:58:25PM +0100, Marek Marczykowski-Górecki 
> > > wrote:
> > > > On Fri, Feb 13, 2026 at 02:23:06AM +0100, Marek Marczykowski-Górecki 
> > > > wrote:
> > > > > On Thu, Feb 12, 2026 at 04:11:50PM +0100, Roger Pau Monné wrote:
> > > > > > On Tue, Feb 10, 2026 at 07:06:20PM +0100, Marek 
> > > > > > Marczykowski-Górecki wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Recently I started testing compatibility with Intel Lunar Lake. 
> > > > > > > This is
> > > > > > > the first one that uses "xe" instead of "i915" Linux driver for 
> > > > > > > iGPU.
> > > > > > > I test it with Qubes OS 4.3, which uses Xen 4.19.4 and PV dom0 
> > > > > > > running
> > > > > > > Linux 6.17.9 in this test.
> > > > > > 
> > > > > > Not sure it's going to help a lot, but does using a PVH dom0 make 
> > > > > > any
> > > > > > difference?
> > > > > 
> > > > > Ok, now with the correct Xen version, it's better with PVH dom0. At
> > > > > least on the login screen and few applications (from both dom0 and 
> > > > > domU)
> > > > > I don't see the glitches anymore. I can't do a full test, because PCI
> > > > > passthrough doesn't seem to work with PVH dom0 on Xen 4.19 - and I 
> > > > > need
> > > > > it to start most VMs.
> > > > > 
> > > > > So, if the above test is representative, it's only about PV dom0.
> > > > 
> > > > Some further observations:
> > > > 
> > > > 1. My initial impression that Xen 4.17.6 is not affected is false.
> > > > Apparently I got lucky and didn't waited long enough for glitches to
> > > > appear. Unfortunately this means I have no way to bisect this...
> > > > 
> > > > 1a. Updated test procedure - either:
> > > >   - start Qubes OS in full (including default system domUs) and try to
> > > >     open an app in one of them (for example file manager or pdf viewer)
> > > >   - start Linux up to lightdm login page, log in, log out, click on a
> > > >     few lightdm menus (session type selector, poewroff menu etc)
> > > > 
> > > > The second version works even if toolstack version in dom0 doesn't match
> > > > Xen version. If no glitches are observed after doing either of those
> > > > procedures, assume it's good.
> > > > 
> > > > 2. Xen staging is affected too. As well as Xen staging-4.19 without
> > > > any qubes patches.
> > > > 
> > > > 3. After enabling CONFIG_DEBUG in Xen, the xe.ko fails to load firmware:
> > > > 
> > > >     xe 0000:00:02.0: [drm] Tile0: GT0: Using GuC firmware from 
> > > > xe/lnl_guc_70.bin version 70.53.0
> > > >     xe 0000:00:02.0: [drm] *ERROR* Tile0: GT0: load failed: status = 
> > > > 0x40000056, time = 0ms, freq = 1850MHz (req 1850MHz), done = -1
> > > >     xe 0000:00:02.0: [drm] *ERROR* Tile0: GT0: load failed: status: 
> > > > Reset = 0, BootROM = 0x2B, UKernel = 0x00, MIA = 0x00, Auth = 0x01
> > > >     xe 0000:00:02.0: [drm] *ERROR* Tile0: GT0: firmware production part 
> > > > check failure
> > > >     xe 0000:00:02.0: [drm] *ERROR* Tile0: GT0: Failed to initialize uC 
> > > > (-EPROTO)
> > > >     xe 0000:00:02.0: probe with driver xe failed with error -71
> > > > 
> > > > CONFIG_DEBUG is the only change between "xe.ko loads fine but there are
> > > > glitches later on" and "xe.ko fails to load at all". Full console logs:
> > > > https://gist.github.com/marmarek/47b5e62a2cdbae6678c2aecc5283cd3f, there
> > > > are 3 files:
> > > >   - CONFIG_DEBUG=n
> > > >   - CONFIG_DEBUG=y
> > > >   - CONFIG_DEBUG=y + iommu=debug
> > > > 
> > > > 4. Updating to Linux 7.0-rc1 doesn't help, for example:
> > > > https://openqa.qubes-os.org/tests/168119#step/desktop_linux_manager_create_qube/11
> > > > 
> > > > Generally, it does feel like a bug in xe.ko, but I can't exclude some 
> > > > issue
> > > > on Xen side too (especially given point 3 above).
> > > 
> > > After waiting some time (Linux 6.19.5 this time), Xen CONFIG_DEBUG=n, I 
> > > get some timeout messages:
> > > 
> > >     [    8.122120] xe 0000:00:02.0: [drm] [ENCODER:204:DDI A/PHY A] 
> > > failed to retrieve link info, disabling eDP
> > >     [    8.148476] xe 0000:00:02.0: [drm] Tile0: GT0: Using GuC firmware 
> > > from xe/lnl_guc_70.bin version 70.53.0
> > >     [    8.803845] xe 0000:00:02.0: [drm] Tile0: GT0: ccs1 fused off
> > >     [    8.804208] xe 0000:00:02.0: [drm] Tile0: GT0: ccs2 fused off
> > >     [    8.804556] xe 0000:00:02.0: [drm] Tile0: GT0: ccs3 fused off
> > >     [    8.822426] xe 0000:00:02.0: [drm] Tile0: GT1: Using GuC firmware 
> > > from xe/lnl_guc_70.bin version 70.53.0
> > >     [    8.827140] xe 0000:00:02.0: [drm] Tile0: GT1: Using HuC firmware 
> > > from xe/lnl_huc.bin version 9.4.13
> > >     [    8.829478] xe 0000:00:02.0: [drm] Tile0: GT1: Using GSC firmware 
> > > from xe/lnl_gsc_1.bin version 104.0.5.1429
> > >     [    8.852923] xe 0000:00:02.0: [drm] Tile0: GT1: vcs1 fused off
> > >     [    8.853513] xe 0000:00:02.0: [drm] Tile0: GT1: vcs2 fused off
> > >     [    8.854090] xe 0000:00:02.0: [drm] Tile0: GT1: vcs3 fused off
> > >     [    8.854706] xe 0000:00:02.0: [drm] Tile0: GT1: vcs4 fused off
> > >     [    8.855310] xe 0000:00:02.0: [drm] Tile0: GT1: vcs5 fused off
> > >     [    8.855904] xe 0000:00:02.0: [drm] Tile0: GT1: vcs6 fused off
> > >     [    8.856495] xe 0000:00:02.0: [drm] Tile0: GT1: vcs7 fused off
> > >     [    8.857079] xe 0000:00:02.0: [drm] Tile0: GT1: vecs1 fused off
> > >     [    8.857675] xe 0000:00:02.0: [drm] Tile0: GT1: vecs2 fused off
> > >     [    8.858272] xe 0000:00:02.0: [drm] Tile0: GT1: vecs3 fused off
> > >     [    8.975881] xe 0000:00:02.0: [drm] Registered 3 planes with drm 
> > > panic
> > >     [    8.976586] [drm] Initialized xe 1.1.0 for 0000:00:02.0 on minor 0
> > >     [    8.980882] ACPI: video: Video Device [GFX0] (multi-head: yes  
> > > rom: no  post: no)
> > >     [    9.033754] xe 0000:00:02.0: [drm] Tile0: GT1: found GSC cv104.1.0
> > >     ...
> > >     [ 1218.319232] xe 0000:00:02.0: [drm] Tile0: GT0: Engine reset: 
> > > engine_class=rcs, logical_mask: 0x1, guc_id=3
> > >     [ 1218.319890] xe 0000:00:02.0: [drm] Tile0: GT0: Timedout job: 
> > > seqno=9883, lrc_seqno=9883, guc_id=3, flags=0x0 in Xorg [3245]
> > >     [ 1218.320736] xe 0000:00:02.0: [drm] Xe device coredump has been 
> > > created
> > >     [ 1218.321140] xe 0000:00:02.0: [drm] Check your 
> > > /sys/class/drm/card0/device/devcoredump/data
> > >     [ 1222.285626] xe 0000:00:02.0: [drm] *ERROR* [CRTC:88:pipe A] 
> > > flip_done timed out
> > >     [ 1232.525685] xe 0000:00:02.0: [drm] *ERROR* flip_done timed out
> > >     [ 1232.526280] xe 0000:00:02.0: [drm] *ERROR* [CRTC:88:pipe A] commit 
> > > wait timed out
> > >     [ 1242.765717] xe 0000:00:02.0: [drm] *ERROR* [CRTC:88:pipe A] 
> > > flip_done timed out
> > >     [ 1253.005696] xe 0000:00:02.0: [drm] *ERROR* flip_done timed out
> > >     [ 1253.006248] xe 0000:00:02.0: [drm] *ERROR* [CRTC:88:pipe A] commit 
> > > wait timed out
> > >     [ 1263.245599] xe 0000:00:02.0: [drm] *ERROR* [CRTC:88:pipe A] 
> > > flip_done timed out
> > > 
> > > The glitches appear much earlier, though.
> > > Would content of /sys/class/drm/card0/device/devcoredump/data be useful
> > > for debugging this?
> 
> Yes, it would. Jobs hanging can be a bug anywhere in the stack (e.g.,
> Hardware bug, KMD bug, UMD bug, application bug, etc...) but the
> devcoredump would give us some hints.
> 
> > > 
> > > Full log at https://openqa.qubes-os.org/tests/168813/file/serial0.txt
> > > (warning, almost 200MB of those errors...)
> > 
> > The issue still happens with Linux 7.0.12. Current log (quite similar to
> > the previous one):
> > https://openqa.qubes-os.org/tests/184602/logfile?filename=serial0.txt
> 
> Hmm, the 'not started' messages in the dmesg are a bit concerning as
> this really shouldn't be possible to trigger even if user space is doing
> something wrong.
> 
> Can you file a gitlab issue against Xe here: 
> https://gitlab.freedesktop.org/drm/xe/kernel/issues/new
> 
> TBH, I have no idea if running Xen / Qubes OS + Xe is something anyone
> at Intel has tried out, so please include instructions on to how
> reproduce and we will see in someone on engineering team can take a look
> at this and if issues in Xe KMD exist, try to get these fixed.

I've opened https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/8382
including the crashdump and fresh logs.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.