[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Race condition on device add hanling in xl devd
On Mon, Dec 17, 2018 at 02:05:34PM +0100, Roger Pau Monné wrote: > On Mon, Dec 17, 2018 at 01:23:15PM +0100, Marek Marczykowski-Górecki wrote: > > On Mon, Dec 17, 2018 at 01:18:55PM +0100, Roger Pau Monné wrote: > > > On Mon, Dec 17, 2018 at 01:00:01PM +0100, Marek Marczykowski-Górecki > > > wrote: > > > > On Mon, Dec 17, 2018 at 10:40:59AM +0100, Roger Pau Monné wrote: > > > > > On Sun, Dec 16, 2018 at 02:47:43AM +0100, Marek Marczykowski-Górecki > > > > > wrote: > > > > > > A workaround could be implemented in hotplug script itself - wait > > > > > > for > > > > > > the device there. I'm not sure how proper solution could look like. > > > > > > Some > > > > > > synchronization between xl devd and the kernel (like xl devd > > > > > > monitoring > > > > > > uevents)? > > > > > > > > > > There's already a synchronization mechanism, libxl waits for the > > > > > backend to switch to state 2 (XenbusStateInitWait) before running the > > > > > hotplug scripts [0]. > > > > > > > > > > Maybe netback sets state 2 before creating the backend device? > > > > > > > > > > It looks to me like the backend needs to be sure everything needed by > > > > > the hotplug script is in place before switching to state 2. > > > > > > > > I've done some more tests and I think that's something else. I've added > > > > a loop waiting for /sys/class/net/$vif to a hotplug script, but it timed > > > > out (5s). I don't see _any_ kernel messages related to the device. > > > > > > > > It may be some bug in nested virtualization in KVM... > > > > > > In your message you said you have also observed this behavior when > > > running on bare metal, so it's likely not related to nested > > > virtualization? > > > > Yes, but on bare metal is so hard to reproduce (like 0.1% or even less > > startups), I'm not really sure if that was the same problem, as the > > problem doesn't leave that much logs... > > I'm not very familiar with netback, but I think it's indeed possible > for netback to switch to state 2 without having created the vif. > Netback switching from state 1 -> 2 seems to be solely controlled by > the frontend state (see frontend_changed). Isn't frontend_changed guaranteed to be called after netback_probe? > I think the patch below could solve this issue, but I haven't even > compile tested it, could you give it a spin? > > I would also like to hear the opinion of netback maintainers, since I > might be completely wrong. > > Thanks, Roger. > ---8<--- > diff --git a/drivers/net/xen-netback/xenbus.c > b/drivers/net/xen-netback/xenbus.c > index cd51492ae6c2..791c2c0b788f 100644 > --- a/drivers/net/xen-netback/xenbus.c > +++ b/drivers/net/xen-netback/xenbus.c > @@ -427,6 +427,10 @@ static int netback_probe(struct xenbus_device *dev, > if (err) > goto fail; > > + err = xenbus_switch_state(dev, XenbusStateInitWait); > + if (err) > + goto fail; > + > return 0; > > abort_transaction: > @@ -650,7 +654,10 @@ static void frontend_changed(struct xenbus_device *dev, > > switch (frontend_state) { > case XenbusStateInitialising: > - set_backend_state(be, XenbusStateInitWait); > + if (dev->state == XenbusStateClosed) { > + pr_info("%s: prepare for reconnect\n", dev->nodename); > + set_backend_state(be, XenbusStateInitWait); > + } > break; > > case XenbusStateInitialised: > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |