[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PVH Dom0 related UART failure



On Sat, May 20, 2023 at 12:28:59PM +0200, Roger Pau Monné wrote:
> On Fri, May 19, 2023 at 05:02:21PM -0700, Stefano Stabellini wrote:
> > On Fri, 19 May 2023, Roger Pau Monné wrote:
> > > On Thu, May 18, 2023 at 06:46:52PM -0700, Stefano Stabellini wrote:
> > > > On Thu, 18 May 2023, Roger Pau Monné wrote:
> > > > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH 
> > > > > > Dom0
> > > > > > test with the brand new gitlab-ci runner offered by Qubes. It is an 
> > > > > > AMD
> > > > > > Zen3 system and we already have a few successful tests with it, see
> > > > > > automation/gitlab-ci/test.yaml.
> > > > > > 
> > > > > > We managed to narrow down the issue to a console problem. We are
> > > > > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command 
> > > > > > line
> > > > > > options, it works with PV Dom0 and it is using a PCI UART card.
> > > > > > 
> > > > > > In the case of Dom0 PVH:
> > > > > > - it works without console=com1
> > > > > > - it works with console=com1 and with the patch appended below
> > > > > > - it doesn't work otherwise and crashes with this error:
> > > > > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK
> > > > > 
> > > > > Jan also noticed this, and we have a ticket for it in gitlab:
> > > > > 
> > > > > https://gitlab.com/xen-project/xen/-/issues/85
> > > > > 
> > > > > > What is the right way to fix it?
> > > > > 
> > > > > I think the right fix is to simply avoid hidden devices from being
> > > > > handled by vPCI, in any case such devices won't work propewrly with
> > > > > vPCI because they are in use by Xen, and so any cached information by
> > > > > vPCI is likely to become stable as Xen can modify the device without
> > > > > vPCI noticing.
> > > > > 
> > > > > I think the chunk below should help.  It's not clear to me however how
> > > > > hidden devices should be handled, is the intention to completely hide
> > > > > such devices from dom0?
> > > > 
> > > > I like the idea but the patch below still failed:
> > > > 
> > > > (XEN) Xen call trace:
> > > > (XEN)    [<ffff82d0402682b0>] R 
> > > > drivers/vpci/header.c#modify_bars+0x2b3/0x44d
> > > > (XEN)    [<ffff82d040268714>] F 
> > > > drivers/vpci/header.c#init_bars+0x2ca/0x372
> > > > (XEN)    [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a
> > > > (XEN)    [<ffff82d0404408b9>] F 
> > > > drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97
> > > > (XEN)    [<ffff82d0404409b0>] F 
> > > > drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b
> > > > (XEN)    [<ffff82d04027df08>] F 
> > > > drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69
> > > > (XEN)    [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c
> > > > (XEN)    [<ffff82d04043cb1a>] F 
> > > > drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd
> > > > (XEN)    [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53
> > > > (XEN)    [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d
> > > > (XEN)    [<ffff82d040468914>] F construct_dom0+0x5c/0xb7
> > > > (XEN)    [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d
> > > > (XEN)    [<ffff82d040203344>] F __high_start+0x94/0xa0
> > > > 
> > > > I haven't managed to figure out why yet.
> > > 
> > > Do you have some other patches applied?
> > > 
> > > I've tested this by manually hiding a device on my system and can
> > > confirm that without the fix I hit the ASSERT, but with the patch
> > > applied I no longer hit it.  I have no idea how can you get into
> > > init_bars if the device is hidden and thus belongs to dom_xen.
> > 
> > Unfortunately it doesn't work. Here are the full logs with interesting
> > DEBUG messages (search for "DEBUG"):
> > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4318489116
> > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/31c400caa7b86d4c14f9553138e02af18d3b3284
> > 
> > [...]
> > (XEN) DEBUG ns16550_init_postirq 432  03:00.0
> > [...]
> > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.0 0^M
> > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.2 1^M
> > (XEN) DEBUG vpci_add_handlers 78 0000:00:00.2^M
> 
> This device is not handled by vPCI either, and is not the console
> device.

That's IOMMU.

Full lspci:

00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe Root Complex [1022:14b5] (rev 01)
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h IOMMU 
[1022:14b6]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe GPP Bridge [1022:14ba]
00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe GPP Bridge [1022:14ba]
00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe GPP Bridge [1022:14ba]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h 
USB4/Thunderbolt PCIe tunnel [1022:14cd]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
Internal PCIe GPP Bridge [1022:14b9] (rev 10)
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
Internal PCIe GPP Bridge [1022:14b9] (rev 10)
00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h 
Internal PCIe GPP Bridge [1022:14b9] (rev 10)
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller 
[1022:790b] (rev 71)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge 
[1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data 
Fabric: Device 18h; Function 0 [1022:1679]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data 
Fabric: Device 18h; Function 1 [1022:167a]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data 
Fabric: Device 18h; Function 2 [1022:167b]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data 
Fabric: Device 18h; Function 3 [1022:167c]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data 
Fabric: Device 18h; Function 4 [1022:167d]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data 
Fabric: Device 18h; Function 5 [1022:167e]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data 
Fabric: Device 18h; Function 6 [1022:167f]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data 
Fabric: Device 18h; Function 7 [1022:1680]
01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller 
I225-V [8086:15f3] (rev 03)
02:00.0 Network controller [0280]: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 
80MHz [14c3:0608]
03:00.0 Serial controller [0700]: Exar Corp. XR17V3521 Dual PCIe UART 
[13a8:0352] (rev 03)
34:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
[AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev 0a)
34:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt 
Radeon High Definition Audio Controller [1002:1640]
34:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] 
VanGogh PSP/CCP [1022:1649]
34:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt 
USB4 XHCI controller #3 [1022:161d]
34:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt 
USB4 XHCI controller #4 [1022:161e]
34:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] 
ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 60)
34:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h/19h 
HD Audio Controller [1022:15e3]
35:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA 
Controller [AHCI mode] [1022:7901] (rev a1)
36:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt 
USB4 XHCI controller #8 [1022:161f]
36:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt 
USB4 XHCI controller #5 [1022:15d6]
36:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt 
USB4 XHCI controller #6 [1022:15d7]
36:00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt 
USB4/Thunderbolt NHI controller #1 [1022:162e]

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.