Xen project Mailing List

Re: [PATCH] libxl: Retry QMP PCI device_add

To: Anthony PERARD <anthony.perard@xxxxxxxxxx>

From: Jason Andryuk <jandryuk@xxxxxxxxx>

Date: Fri, 8 Apr 2022 15:34:56 -0400

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 08 Apr 2022 19:35:35 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Apr 8, 2022 at 10:56 AM Anthony PERARD <anthony.perard@xxxxxxxxxx> wrote: > > On Wed, Mar 30, 2022 at 03:46:56PM -0400, Jason Andryuk wrote: > > PCI device assignment to an HVM with stubdom is potentially racy. First > > the PCI device is assigned to the stubdom via the PV PCI protocol. Then > > QEMU is sent a QMP command to attach the PCI device to QEMU running > > within the stubdom. However, the sysfs entries within the stubdom may > > not have appeared by the time QEMU receives the device_add command > > resulting in errors like: > > > > libxl_qmp.c:1838:qmp_ev_parse_error_messages:Domain 10:Could not open > > '/sys/bus/pci/devices/0000:00:1f.3/config': No such file or directory > > > > This patch retries the device assignment up to 10 times with a 1 second > > delay between. That roughly matches the overall hotplug timeout. > > > > The qmp_ev_parse_error_messages error is still printed since it happens > > at a lower level than the pci code controlling the retries. With that, > > the "Retrying PCI add %d" message is also printed at ERROR level to > > clarify what is happening. > > > > Signed-off-by: Jason Andryuk <jandryuk@xxxxxxxxx> > > --- > > @@ -1252,10 +1258,22 @@ static void pci_add_qmp_device_add_cb(libxl__egc > > *egc, > > const libxl__json_object *response, > > int rc) > > { > > - EGC_GC; > > pci_add_state *pas = CONTAINER_OF(qmp, *pas, qmp); > > + STATE_AO_GC(pas->aodev->ao); > > I think this changes are wrong, what is the reason to use STATE_AO_GC > instead of EGC_GC? > I think in this case, it is fine to keep using EGC_GC, as there doesn't > seems to be any allocation that needs to live after this function > returns. And you could just pass `pas->aodev->ao` to > libxl__ev_time_register_rel(). Yes, I think you are correct. I think I made this change to use a libxl macro to declare the local ao variable, but as you say that is not needed. > > > > - if (rc) goto out; > > + if (rc) { > > + if (pas->retries++ < 10) { > > + LOGD(ERROR, qmp->domid, "Retrying PCI add %d", pas->retries); > > + rc = libxl__ev_time_register_rel(ao, &pas->timeout_retries, > > + pci_add_qmp_device_add_retry, > > + 1000); > > + if (rc) goto out; > > + > > + return; /* wait for the timeout to then retry */ > > + } else { > > + goto out; > > + } > > + } > > So this retry logic would be done regardless of whether stubdom is in > use or not. It's not going to be useful in the non-stubdom case, is it? Yes, it should not be applicable for non-stubdom since the sysfs entries should already be present. > When adding a pci device to a domain that has its device model in a > stubdomain, there's a first call to do_pci_add() which works fine, > right? Then there's a callback to device_pci_add_stubdom_wait(), which > is supposed to wait for the stubdom to be ready, but the sysfs entry > might still be missing at that time, right? Then, there is a second call > to do_pci_add() for the guest, and it's the one that fail in your case, > right? Yes, I think that is all correct. > If my reasoning is correct, could we enable the retry logic only for the > second do_pci_add() call? So that guests without stubdom aren't impacted > as I don't think retrying in this case would be useful and would just > delay the error. That should be possible. Instead of differentiating between the do_pci_add invocation, I'm thinking pci_add_qmp_device_add_cb would gain a stubdom check: if (rc) { /* Retry only applicable for HVM with stubdom. */ if (libxl_get_stubdom_id(CTX, domid) == 0) goto out; /* retry logic */ } I just noticed that pci_add_qmp_device_add() registers the pci_add_timeout, and pci_add_qmp_device_add_retry() calls back into pci_add_qmp_device_add(). I expected a single 10 second timeout across all the entire add operation, but the retry logic is re-registering it. The duplicate registration may also be corrupting lists? I plan to move the pci_add_timeout registration into do_pci_add() I went with 10 retries of 1 second each since that matches the sleep(1) loop patch coming out of OpenXT. It seems like it could be optimized, but trying for longer and succeeding seems better than shorter and failing. Thanks, Jason

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.