[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: PVH Dom0 related UART failure
On Fri, May 19, 2023 at 05:02:21PM -0700, Stefano Stabellini wrote: > On Fri, 19 May 2023, Roger Pau Monné wrote: > > On Thu, May 18, 2023 at 06:46:52PM -0700, Stefano Stabellini wrote: > > > On Thu, 18 May 2023, Roger Pau Monné wrote: > > > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > > > > > Hi all, > > > > > > > > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH > > > > > Dom0 > > > > > test with the brand new gitlab-ci runner offered by Qubes. It is an > > > > > AMD > > > > > Zen3 system and we already have a few successful tests with it, see > > > > > automation/gitlab-ci/test.yaml. > > > > > > > > > > We managed to narrow down the issue to a console problem. We are > > > > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command > > > > > line > > > > > options, it works with PV Dom0 and it is using a PCI UART card. > > > > > > > > > > In the case of Dom0 PVH: > > > > > - it works without console=com1 > > > > > - it works with console=com1 and with the patch appended below > > > > > - it doesn't work otherwise and crashes with this error: > > > > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > > > > > > > Jan also noticed this, and we have a ticket for it in gitlab: > > > > > > > > https://gitlab.com/xen-project/xen/-/issues/85 > > > > > > > > > What is the right way to fix it? > > > > > > > > I think the right fix is to simply avoid hidden devices from being > > > > handled by vPCI, in any case such devices won't work propewrly with > > > > vPCI because they are in use by Xen, and so any cached information by > > > > vPCI is likely to become stable as Xen can modify the device without > > > > vPCI noticing. > > > > > > > > I think the chunk below should help. It's not clear to me however how > > > > hidden devices should be handled, is the intention to completely hide > > > > such devices from dom0? > > > > > > I like the idea but the patch below still failed: > > > > > > (XEN) Xen call trace: > > > (XEN) [<ffff82d0402682b0>] R > > > drivers/vpci/header.c#modify_bars+0x2b3/0x44d > > > (XEN) [<ffff82d040268714>] F > > > drivers/vpci/header.c#init_bars+0x2ca/0x372 > > > (XEN) [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a > > > (XEN) [<ffff82d0404408b9>] F > > > drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 > > > (XEN) [<ffff82d0404409b0>] F > > > drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b > > > (XEN) [<ffff82d04027df08>] F > > > drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 > > > (XEN) [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c > > > (XEN) [<ffff82d04043cb1a>] F > > > drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd > > > (XEN) [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53 > > > (XEN) [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d > > > (XEN) [<ffff82d040468914>] F construct_dom0+0x5c/0xb7 > > > (XEN) [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d > > > (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 > > > > > > I haven't managed to figure out why yet. > > > > Do you have some other patches applied? > > > > I've tested this by manually hiding a device on my system and can > > confirm that without the fix I hit the ASSERT, but with the patch > > applied I no longer hit it. I have no idea how can you get into > > init_bars if the device is hidden and thus belongs to dom_xen. > > Unfortunately it doesn't work. Here are the full logs with interesting > DEBUG messages (search for "DEBUG"): > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4318489116 > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/31c400caa7b86d4c14f9553138e02af18d3b3284 > > [...] > (XEN) DEBUG ns16550_init_postirq 432 03:00.0 > [...] > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.2 1^M > (XEN) DEBUG vpci_add_handlers 78 0000:00:00.2^M This device is not handled by vPCI either, and is not the console device. > (XEN) DEBUG vpci_add_handlers 75 0000:00:01.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:02.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:02.1 0^M > > Then crash on drivers/vpci/header.c#modify_bars Interesting. The crash however is a page fault instead of the previous assert: (XEN) ----[ Xen-4.18-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d040268312>] drivers/vpci/header.c#modify_bars+0x2b3/0x44d [...] (XEN) Xen call trace: (XEN) [<ffff82d040268312>] R drivers/vpci/header.c#modify_bars+0x2b3/0x44d (XEN) [<ffff82d040268776>] F drivers/vpci/header.c#init_bars+0x2ca/0x372 (XEN) [<ffff82d040267412>] F vpci_add_handlers+0x134/0x16c (XEN) [<ffff82d0404408e5>] F drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 (XEN) [<ffff82d0404409dc>] F drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b (XEN) [<ffff82d04027df6a>] F drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 (XEN) [<ffff82d040440e55>] F setup_hwdom_pci_devices+0x25/0x2c (XEN) [<ffff82d04043cb46>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd (XEN) [<ffff82d0404403f5>] F iommu_hwdom_init+0x49/0x53 (XEN) [<ffff82d04045177e>] F dom0_construct_pvh+0x160/0x138d (XEN) [<ffff82d040468934>] F construct_dom0+0x5c/0xb7 (XEN) [<ffff82d0404619e1>] F __start_xen+0x2423/0x272d (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 (XEN) (XEN) Pagetable walk from 000000000000002c: (XEN) L4[0x000] = 000000039015b063 ffffffffffffffff (XEN) L3[0x000] = 000000039015a063 ffffffffffffffff (XEN) L2[0x000] = 0000000390159063 ffffffffffffffff (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 000000000000002c (XEN) **************************************** Looks like a NULL pointer deref. Using addr2line it points at xen/drivers/vpci/header.c:314, which is: for_each_pdev ( pdev->domain, tmp ) { if ( tmp == pdev ) { /* * Need to store the device so it's not constified and defer_map * can modify it in case of error. */ dev = tmp; if ( !rom_only ) /* * If memory decoding is toggled avoid checking against the * same device, or else all regions will be removed from the * memory map in the unmap case. */ continue; } for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) { const struct vpci_bar *bar = &tmp->vpci->header.bars[i]; unsigned long start = PFN_DOWN(bar->addr); unsigned long end = PFN_DOWN(bar->addr + bar->size - 1); -> if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) || So we have a device added to the domain device list that doesn't have vPCI enabled. I'm unsure how we get into that situation in the current scenario, but Xen should be capable of coping with a domain having devices not handled by vPCI. Can you please try the following combined fix, it should also print the offending device. Thanks, Roger. --- diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index ec2e978a4e6b..0ff8e940fa8d 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) */ for_each_pdev ( pdev->domain, tmp ) { + if ( !tmp->vpci ) + { + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", + &tmp->sbdf, pdev->domain); + continue; + } + if ( tmp == pdev ) { /* diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 652807a4a454..0baef3a8d3a1 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) unsigned int i; int rc = 0; - if ( !has_vpci(pdev->domain) ) + if ( !has_vpci(pdev->domain) || + /* + * Ignore RO and hidden devices, those are in use by Xen and vPCI + * won't work on them. + */ + pci_get_pdev(dom_xen, pdev->sbdf) ) return 0; /* We should not get here twice for the same device. */
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |