[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] libxl/devd: fix a race with concurrent device addition/removal



On Wed, May 10, 2017 at 11:47:31AM +0100, Wei Liu wrote:
> On Wed, May 10, 2017 at 11:43:57AM +0100, Roger Pau Monne wrote:
> > On Wed, May 10, 2017 at 11:32:45AM +0100, Ian Jackson wrote:
> > > Roger Pau Monne writes ("[PATCH 1/2] libxl/devd: fix a race with 
> > > concurrent device addition/removal"):
> > > > Current code can free the libxl__device inside of the 
> > > > libxl__ddomain_device
> > > > before the addition has finished if a removal happens while an addition 
> > > > is
> > > > still in process:
> > > > 
> > > >   backend_watch_callback
> > > >             |
> > > >             v
> > > >        add_device
> > > >             |                 backend_watch_callback
> > > >     (async operation)                   |
> > > >             |                           v
> > > >             |                     remove_device
> > > >             |                           |
> > > >             |                           V
> > > >             |                    device_complete
> > > >             |                 (free libxl__device)
> > > >             v
> > > >      device_complete
> > > >   (deref libxl__device)
> > > > 
> > > > Fix this by creating a temporary copy of the libxl__device, that's
> > > > tracked by the GC of the nested async operation. This ensures that
> > > > the libxl__device used by the async operations cannot be freed while
> > > > being used.
> > > 
> > > Doesn't this arrange that the remove hotplug script will be invoked
> > > while the add hotplug script is still running ?
> > 
> > That's indeed possible (either with the current code or with this patch),
> > although unlikely. The async code called by remove_device will wait for the
> > backend to switch to state 6, while the add_device code will wait for state 
> > 2
> > IIRC (one can change those states to make them clash probably).
> > 
> > > Is that really desirable (or allowed!) ?
> > 
> > Hm, no, I don't think it's desirable at all. I still think this is better 
> > that
> > the previous code (at lest it doesn't dereference libxl__device anymore), 
> > but
> > clearly needs further improvements.
> > 
> > Also, it seems to me the same can happen even without driver domains, if a 
> > user
> > executes concurrent block-{attach/detach} operations, but maybe I'm missing
> > something?
> > 
> 
> There is a lot of locking for all the device add / remove code. See
> libxl_internal.h L2588.

It's been a long time since I've played with libxl, last time none of this
existed, sorry. Sadly devd completely bypasses all this, a simple way to fix
this would be to call libxl__lock_domain_userdata from {add/remove}_device
maybe? (and drop the lock at device_complete).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.