[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] libxl: Fix stubdom PCI passthrough

On Thu, Aug 12, 2021 at 7:20 AM Ian Jackson <iwj@xxxxxxxxxxxxxx> wrote:
> Jan Beulich writes ("Re: [PATCH v2] libxl: Fix stubdom PCI passthrough"):
> > On 12.08.2021 02:57, Jason Andryuk wrote:
> > > commit 0fdb48ffe7a1 "libxl: Make sure devices added by pci-attach are
> > > reflected in the config" broken stubdom PCI passthrough when it moved
> > > libxl__create_pci_backend later in the function.  xl pci-attach for a
> > > running PV domain may also have been broken, but that was not verified.
> > >
> > > The stubdomain is running (!starting) and PV, so it calls
> > > libxl__wait_for_backend.  With the new placement of
> > > libxl__create_pci_backend, the path does not exist and the call
> > > immediately fails.
> > > libxl: error: libxl_device.c:1388:libxl__wait_for_backend: Backend 
> > > /local/domain/0/backend/pci/43/0 does not exist
> > > libxl: error: libxl_pci.c:1764:device_pci_add_done: Domain 
> > > 42:libxl__device_pci_add failed for PCI device 0:2:0.0 (rc -3)
> > > libxl: error: libxl_create.c:1857:domcreate_attach_devices: Domain 
> > > 42:unable to add pci devices
> > >
> > > The wait is only relevant when the backend is already present.  num_devs
> > > is already used to determine if the backend needs to be created.  Re-use
> > > num_devs to determine if the backend wait is necessary.  The wait is
> > > necessary to avoid racing with another PCI attachment reconfiguring the
> > > front/back. If we are creating the backend, then we don't have to worry
> > > about a racing reconfigure.
> >
> > And why is such a race possible in the first place? If two independent
> > actions are permitted in parallel on a domain, wouldn't there better
> > be synchronization closer to the root of the call tree?

It is possible because pci front/back share the single xenstore state
node but have multiple sub-devices.    "The wait is necessary to avoid
racing with another PCI attachment reconfiguring the front/back" is my
determination after thinking through the code.  xl is poking at the
frontend and backend domains, which are running independently and
their state is not under xl's control.  Connected is the quiescent
state, so it makes sense to wait for it before switching to
Reconfigurating.  That was the old behavior which I am restoring.

> libxl does not have a per-domain lock that would prevent this kind of
> thing.  Individual operations that might malfunction if done
> concurrently are supposed to take appropriate precautions.
> I think that this patch is trying to be those precautions.  It's not
> clear to me whether it's correct, though.  I (or another mailntainer)
> will have to look at it properly...

Please do :)




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.