[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PVH Dom0 related UART failure



On Fri, 19 May 2023, Roger Pau Monné wrote:
> On Thu, May 18, 2023 at 06:46:52PM -0700, Stefano Stabellini wrote:
> > On Thu, 18 May 2023, Roger Pau Monné wrote:
> > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote:
> > > > Hi all,
> > > > 
> > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0
> > > > test with the brand new gitlab-ci runner offered by Qubes. It is an AMD
> > > > Zen3 system and we already have a few successful tests with it, see
> > > > automation/gitlab-ci/test.yaml.
> > > > 
> > > > We managed to narrow down the issue to a console problem. We are
> > > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command line
> > > > options, it works with PV Dom0 and it is using a PCI UART card.
> > > > 
> > > > In the case of Dom0 PVH:
> > > > - it works without console=com1
> > > > - it works with console=com1 and with the patch appended below
> > > > - it doesn't work otherwise and crashes with this error:
> > > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK
> > > 
> > > Jan also noticed this, and we have a ticket for it in gitlab:
> > > 
> > > https://gitlab.com/xen-project/xen/-/issues/85
> > > 
> > > > What is the right way to fix it?
> > > 
> > > I think the right fix is to simply avoid hidden devices from being
> > > handled by vPCI, in any case such devices won't work propewrly with
> > > vPCI because they are in use by Xen, and so any cached information by
> > > vPCI is likely to become stable as Xen can modify the device without
> > > vPCI noticing.
> > > 
> > > I think the chunk below should help.  It's not clear to me however how
> > > hidden devices should be handled, is the intention to completely hide
> > > such devices from dom0?
> > 
> > I like the idea but the patch below still failed:
> > 
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d0402682b0>] R 
> > drivers/vpci/header.c#modify_bars+0x2b3/0x44d
> > (XEN)    [<ffff82d040268714>] F drivers/vpci/header.c#init_bars+0x2ca/0x372
> > (XEN)    [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a
> > (XEN)    [<ffff82d0404408b9>] F 
> > drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97
> > (XEN)    [<ffff82d0404409b0>] F 
> > drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b
> > (XEN)    [<ffff82d04027df08>] F 
> > drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69
> > (XEN)    [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c
> > (XEN)    [<ffff82d04043cb1a>] F 
> > drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd
> > (XEN)    [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53
> > (XEN)    [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d
> > (XEN)    [<ffff82d040468914>] F construct_dom0+0x5c/0xb7
> > (XEN)    [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d
> > (XEN)    [<ffff82d040203344>] F __high_start+0x94/0xa0
> > 
> > I haven't managed to figure out why yet.
> 
> Do you have some other patches applied?
> 
> I've tested this by manually hiding a device on my system and can
> confirm that without the fix I hit the ASSERT, but with the patch
> applied I no longer hit it.  I have no idea how can you get into
> init_bars if the device is hidden and thus belongs to dom_xen.

Unfortunately it doesn't work. Here are the full logs with interesting
DEBUG messages (search for "DEBUG"):
https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4318489116
https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/31c400caa7b86d4c14f9553138e02af18d3b3284

[...]
(XEN) DEBUG ns16550_init_postirq 432  03:00.0
[...]
(XEN) DEBUG vpci_add_handlers 75 0000:00:00.0 0^M
(XEN) DEBUG vpci_add_handlers 75 0000:00:00.2 1^M
(XEN) DEBUG vpci_add_handlers 78 0000:00:00.2^M
(XEN) DEBUG vpci_add_handlers 75 0000:00:01.0 0^M
(XEN) DEBUG vpci_add_handlers 75 0000:00:02.0 0^M
(XEN) DEBUG vpci_add_handlers 75 0000:00:02.1 0^M

Then crash on drivers/vpci/header.c#modify_bars

vpci_add_handlers hasn't even been called yet for the interesing device,
which is 03:00.0 (not 00:02.1).

At that pointed I doubted myself on the previous test so I went back and
re-run it again. I do confirm that the below patch instead (instead, not
in addition) works:


diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 212a9c49ae..24abfaae30 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -429,17 +429,6 @@ static void __init cf_check ns16550_init_postirq(struct 
serial_port *port)
 #ifdef NS16550_PCI
     if ( uart->bar || uart->ps_bdf_enable )
     {
-        if ( uart->param && uart->param->mmio &&
-             rangeset_add_range(mmio_ro_ranges, PFN_DOWN(uart->io_base),
-                                PFN_UP(uart->io_base + uart->io_size) - 1) )
-            printk(XENLOG_INFO "Error while adding MMIO range of device to 
mmio_ro_ranges\n");
-
-        if ( pci_ro_device(0, uart->ps_bdf[0],
-                           PCI_DEVFN(uart->ps_bdf[1], uart->ps_bdf[2])) )
-            printk(XENLOG_INFO "Could not mark config space of %02x:%02x.%u 
read-only.\n",
-                   uart->ps_bdf[0], uart->ps_bdf[1],
-                   uart->ps_bdf[2]);
-
         if ( uart->msi )
         {
             struct msi_info msi = {

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.