[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] libxl_qmp: wait for completion of device removal



On Mon, Aug 05, 2019 at 03:46:35PM +0100, Ian Jackson wrote:
> Chao Gao writes ("Re: [PATCH v2] libxl_qmp: wait for completion of device 
> removal"):
> > On Wed, Jul 03, 2019 at 05:03:17PM +0100, Anthony PERARD wrote:
> > >On Wed, Jul 03, 2019 at 01:56:13PM +0800, Chao Gao wrote:
> > >> To remove a device from a domain, a qmp command is sent to qemu. But it 
> > >> is
> > >> handled by qemu asychronously. Even the qmp command is claimed to be 
> > >> done,
> > >> the actual handling in qemu side may happen later.
> > >> This behavior brings two questions:
> > >> 1. Attaching a device back to a domain right after detaching the device 
> > >> from
> > >> that domain would fail with error:
> > >> 
> > >> libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 
> > >> 1:received an
> > >> error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device
> > >> 
> > >> 2. Accesses to PCI configuration space in Qemu may overlap with later 
> > >> device
> > >> reset issued by 'xl' or by pciback.
> > >> 
> > >> In order to avoid mentioned questions, wait for the completion of device
> > >> removal by querying all pci devices using qmp command and ensuring the 
> > >> target
> > >> device isn't listed. Only retry 5 times to avoid 'xl' potentially being 
> > >> blocked
> > >> by qemu.
> ...
> > Could we merge this patch? or need comments from someone else?
> 
> I see this patch was in fact merged.
> 
> However, there is a problem.
> 
> It adds a new call to "qmp_synchronous_send".  We have been trying to
> make libxl not trust qemu and that means abolishing all uses of
> qmp_synchronous_send.  Now I know that currently we haven't done that
> for PCI passthrough but we should not be adding more techncial debt
> unless we have to.
> 
> Unfortunately the place where this patch has to add code already uses
> this synchronous interface and currently it is therefore not easy to
> do what Chao Gao needed to do.  So it was probably right to accept
> this patch.
> 
> Maybe we should introduce a thing which can make a libxl__ev_qmp from
> a libxl__qmp_handler, or make libxl__qmp_handler contain a
> libxl__ev_qmp, or something ?  If we had had that, Chao Gao could have
> provided a patch introducing new calls to libxl__ev_qmp_send etc.

That looks complicated. I'd rather get rid of that qmp_synchronous_send
stuff.

> Additionally: is it really the case that there is no way to have qemu
> send us a signal when the work is done ?  This polling is rather poor.

There is, QEMU sends an event when the device is removed. Maybe we could
design a new API for libxl__ev_qmp to handle events, I think that's for
another day.

Also, see [1] which reimplements that polling with ev_qmp and ev_timer
and have a note about using events instead:
[1] https://lists.xenproject.org/archives/html/xen-devel/2019-08/msg00241.html

Thanks,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.