[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Race condition on device add hanling in xl devd



On Mon, Dec 17, 2018 at 02:05:34PM +0100, Roger Pau Monné wrote:
> On Mon, Dec 17, 2018 at 01:23:15PM +0100, Marek Marczykowski-Górecki wrote:
> > On Mon, Dec 17, 2018 at 01:18:55PM +0100, Roger Pau Monné wrote:
> > > On Mon, Dec 17, 2018 at 01:00:01PM +0100, Marek Marczykowski-Górecki 
> > > wrote:
> > > > On Mon, Dec 17, 2018 at 10:40:59AM +0100, Roger Pau Monné wrote:
> > > > > On Sun, Dec 16, 2018 at 02:47:43AM +0100, Marek Marczykowski-Górecki 
> > > > > wrote:
> > > > > > A workaround could be implemented in hotplug script itself - wait 
> > > > > > for
> > > > > > the device there. I'm not sure how proper solution could look like. 
> > > > > > Some
> > > > > > synchronization between xl devd and the kernel (like xl devd 
> > > > > > monitoring
> > > > > > uevents)?
> > > > > 
> > > > > There's already a synchronization mechanism, libxl waits for the
> > > > > backend to switch to state 2 (XenbusStateInitWait) before running the
> > > > > hotplug scripts [0].
> > > > > 
> > > > > Maybe netback sets state 2 before creating the backend device?
> > > > > 
> > > > > It looks to me like the backend needs to be sure everything needed by
> > > > > the hotplug script is in place before switching to state 2.
> > > > 
> > > > I've done some more tests and I think that's something else. I've added
> > > > a loop waiting for /sys/class/net/$vif to a hotplug script, but it timed
> > > > out (5s). I don't see _any_ kernel messages related to the device.
> > > > 
> > > > It may be some bug in nested virtualization in KVM...
> > > 
> > > In your message you said you have also observed this behavior when
> > > running on bare metal, so it's likely not related to nested
> > > virtualization?
> > 
> > Yes, but on bare metal is so hard to reproduce (like 0.1% or even less
> > startups), I'm not really sure if that was the same problem, as the
> > problem doesn't leave that much logs...
> 
> I'm not very familiar with netback, but I think it's indeed possible
> for netback to switch to state 2 without having created the vif.
> Netback switching from state 1 -> 2 seems to be solely controlled by
> the frontend state (see frontend_changed).

Isn't frontend_changed guaranteed to be called after netback_probe?

> I think the patch below could solve this issue, but I haven't even
> compile tested it, could you give it a spin?
> 
> I would also like to hear the opinion of netback maintainers, since I
> might be completely wrong.
> 
> Thanks, Roger.
> ---8<---
> diff --git a/drivers/net/xen-netback/xenbus.c 
> b/drivers/net/xen-netback/xenbus.c
> index cd51492ae6c2..791c2c0b788f 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -427,6 +427,10 @@ static int netback_probe(struct xenbus_device *dev,
>       if (err)
>               goto fail;
>  
> +     err = xenbus_switch_state(dev, XenbusStateInitWait);
> +     if (err)
> +             goto fail;
> +
>       return 0;
>  
>  abort_transaction:
> @@ -650,7 +654,10 @@ static void frontend_changed(struct xenbus_device *dev,
>  
>       switch (frontend_state) {
>       case XenbusStateInitialising:
> -             set_backend_state(be, XenbusStateInitWait);
> +             if (dev->state == XenbusStateClosed) {
> +                     pr_info("%s: prepare for reconnect\n", dev->nodename);
> +                     set_backend_state(be, XenbusStateInitWait);
> +             }
>               break;
>  
>       case XenbusStateInitialised:
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.