[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] libxl/PCI: Fix PV hotplug & stubdom coldplug



On Mon, Jan 10, 2022 at 4:09 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> (also Cc-ing Paul)

Thanks.

> On 09.01.2022 19:04, Jason Andryuk wrote:
> > commit 0fdb48ffe7a1 "libxl: Make sure devices added by pci-attach are
> > reflected in the config" broken PCI hotplug (xl pci-attach) for PV
> > domains when it moved libxl__create_pci_backend() later in the function.
> >
> > This also broke HVM + stubdom PCI passthrough coldplug.  For that, the
> > PCI devices are hotplugged to a running PV stubdom, and then the QEMU
> > QMP device_add commands are made to QEMU inside the stubdom.
> >
> > Are running PV domain calls libxl__wait_for_backend().
>
> I could only make sense of this when replacing "Are" by "A", but then
> I'm not sure that's what you've meant.

Yes, "A".

> >  With the current
> > placement of libxl__create_pci_backend(), the path does not exist and
> > the call immediately fails:
> > libxl: error: libxl_device.c:1388:libxl__wait_for_backend: Backend 
> > /local/domain/0/backend/pci/43/0 does not exist
> > libxl: error: libxl_pci.c:1764:device_pci_add_done: Domain 
> > 42:libxl__device_pci_add failed for PCI device 0:2:0.0 (rc -3)
> > libxl: error: libxl_create.c:1857:domcreate_attach_devices: Domain 
> > 42:unable to add pci devices
> >
> > The wait is only relevant when the backend is already present.  num_devs
> > is already used to determine if the backend needs to be created.  Re-use
> > num_devs to determine if the backend wait is necessary.  The wait is
> > necessary to avoid racing with another PCI attachment reconfiguring the
> > front/back or changing to some other state like closing. If we are
> > creating the backend, then we don't have to worry about the state since
> > it is being created.
>
> While I follow all of this, what I'm missing here is some form of proof
> why the wait is _not_ needed for a newly created backend.  Isn't it
> necessary for the backend to reach Connected also when putting in place
> the first device, in order for the guest to be able to actually use the
> device?

The backend creation is now done as part of the xenstore transaction,
see "This is the first device, so create the backend" in
libxl__device_pci_add_xenstore().  That is part of Paul's change in
0fdb48ffe7a1.  Before that, it created the backend and waited.

> Is it perhaps assumed that the frontend driver would wait for
> the backend reaching Connected (emphasizing the difference to HVM,
> where no frontend driver exists)? Whatever the answer, it may be
> worthwhile to also add that (in suitably abbreviated form) to the newly
> added code comment.

I think my starting vs. !starting comment below will clarify this.

> > Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are
> > reflected in the config")
> >
> > Signed-off-by: Jason Andryuk <jandryuk@xxxxxxxxx>
> > ---
> > Alternative to Jan's patch:
> > https://lore.kernel.org/xen-devel/5114ae87-bc0e-3d58-e16e-6d9d2fee0801@xxxxxxxx/
>
> This answers the question raised in reply to Anthony's comment on my
> patch.
>
> > --- a/tools/libs/light/libxl_pci.c
> > +++ b/tools/libs/light/libxl_pci.c
> > @@ -157,8 +157,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc 
> > *gc,
> >      if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
> >          return ERROR_FAIL;
> >
> > -    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> > -        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", 
> > XenbusStateConnected)) < 0)
> > +    /* wait is only needed if the backend already exists (num_devs != 
> > NULL) */
> > +    if (num_devs && !starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> > +        if (libxl__wait_for_backend(gc, be_path,
> > +                                    GCSPRINTF("%d", XenbusStateConnected)) 
> > < 0)
> >              return ERROR_FAIL;
> >      }
> >
>
> In how far is the !starting part of the condition then still needed? I
> have to admit that I've been wondering about the applicability of this
> condition anyway, and your patch description actually seems to further
> support my suspicion about this not being quite right (which is
> connected to the fact that devices get added one by one instead of all
> in one go, which would avoid the need for the backend to reconfigure
> at all during domain creation).

In the starting case, libxl can write all the xenstore entries without
waiting.  The frontend is not running, so this function is called
multiple times and the additional subdevices are written.

But you have a point that the backend is running and will see the
xenstores entries.  However, Reconfiguring is only set when !starting.
And since the frontend is not running when starting, then the backend
cannot have transitioned to Connected.  Looking at xen-pciback, it
waits for the frontend to go to Initialized (From Initializing) before
transitioning to Connected.  So the multiple independent xenstore
writes accumulate for the starting case.

In the !starting (running) case, the code adds subdevices one at a
time and waits for each reconfiguration to complete prior to modifying
the xenstore state.

You previously mentioned wanting all the devices at written at once.
libxl is written to handle 1 device at a time.  That works fine for
most PV devices, but PCI with subdevices doesn't fit that paradigm.
While the PV xenstore writes could be done as a single transaction,
HVM still requires individual QMP device_add commands per device.  It
could be made to work, but it seemed like more work than it is worth
and doesn't seem to mesh well with the rest of libxl.

> Two nits:
>
> With tools/libs/light/CODING_STYLE not saying anything about comment
> style, imo ./CODING_STYLE applies, which demands comments to start with
> a capital letter.

Yes, this sounds good.

> Plus, while I'm not sure what the conventions in libxl are, touching this
> code would seem like a great opportunity to me to fold the two if()-s.

I prefer them separate.  There are two logical conditions - "do we
need to wait?" and "did the wait succeed?".  Keeping them separate
maintains that distinction, and it is more readable IMO.

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.