Re: [Xen-devel] Race condition on device add hanling in xl devd

On Mon, Dec 17, 2018 at 10:40:59AM +0100, Roger Pau Monné wrote:
> On Sun, Dec 16, 2018 at 02:47:43AM +0100, Marek Marczykowski-Górecki wrote:
> > I've found a race condition with handling new devices in driver domain.
> > xl devd calls hotplug script when new device is detected in xenstore. At
> > the same time, asynchronously, kernel create actual backend device (vif
> > in my case). In rare circumstances (especially under high system load)
> > it may happen that hotplug script is executed before kernel create the
> > device, and the hotplug script fails. When hotplug scripts were called
> > by udev, that race didn't existed as udev was informed about the device
> > by the kernel.
> > I'm not sure if the race applies to backend in dom0 - haven't happened
> > to me, but that doesn't really prove anything.
> > Can you remind me why in driver domain xl devd is used now, instead of
> > udev?
> udev is Linux specific, while the current code works for Linux, NetBSD
> and FreeBSD.
> > A workaround could be implemented in hotplug script itself - wait for
> > the device there. I'm not sure how proper solution could look like. Some
> > synchronization between xl devd and the kernel (like xl devd monitoring
> > uevents)?
> There's already a synchronization mechanism, libxl waits for the
> backend to switch to state 2 (XenbusStateInitWait) before running the
> hotplug scripts [0].
> Maybe netback sets state 2 before creating the backend device?
> It looks to me like the backend needs to be sure everything needed by
> the hotplug script is in place before switching to state 2.

I've done some more tests and I think that's something else. I've added
a loop waiting for /sys/class/net/$vif to a hotplug script, but it timed
out (5s). I don't see _any_ kernel messages related to the device.

It may be some bug in nested virtualization in KVM...

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
