[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Race condition on device add hanling in xl devd



> -----Original Message-----
> From: Roger Pau Monne
> Sent: 17 December 2018 14:32
> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Cc: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; xen-
> devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>
> Subject: Re: [Xen-devel] Race condition on device add hanling in xl devd
> 
> On Mon, Dec 17, 2018 at 01:11:11PM +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 17 December 2018 13:06
> > > To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > > Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>; Wei Liu
> > > <wei.liu2@xxxxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> > > Subject: Re: [Xen-devel] Race condition on device add hanling in xl
> devd
> > >
> > > On Mon, Dec 17, 2018 at 01:23:15PM +0100, Marek Marczykowski-Górecki
> > > wrote:
> > > > On Mon, Dec 17, 2018 at 01:18:55PM +0100, Roger Pau Monné wrote:
> > > > > On Mon, Dec 17, 2018 at 01:00:01PM +0100, Marek Marczykowski-
> Górecki
> > > wrote:
> > > > > > On Mon, Dec 17, 2018 at 10:40:59AM +0100, Roger Pau Monné wrote:
> > > > > > > On Sun, Dec 16, 2018 at 02:47:43AM +0100, Marek Marczykowski-
> > > Górecki wrote:
> > > > > > > > A workaround could be implemented in hotplug script itself -
> > > wait for
> > > > > > > > the device there. I'm not sure how proper solution could
> look
> > > like. Some
> > > > > > > > synchronization between xl devd and the kernel (like xl devd
> > > monitoring
> > > > > > > > uevents)?
> > > > > > >
> > > > > > > There's already a synchronization mechanism, libxl waits for
> the
> > > > > > > backend to switch to state 2 (XenbusStateInitWait) before
> running
> > > the
> > > > > > > hotplug scripts [0].
> > > > > > >
> > > > > > > Maybe netback sets state 2 before creating the backend device?
> > > > > > >
> > > > > > > It looks to me like the backend needs to be sure everything
> needed
> > > by
> > > > > > > the hotplug script is in place before switching to state 2.
> > > > > >
> > > > > > I've done some more tests and I think that's something else.
> I've
> > > added
> > > > > > a loop waiting for /sys/class/net/$vif to a hotplug script, but
> it
> > > timed
> > > > > > out (5s). I don't see _any_ kernel messages related to the
> device.
> > > > > >
> > > > > > It may be some bug in nested virtualization in KVM...
> > > > >
> > > > > In your message you said you have also observed this behavior when
> > > > > running on bare metal, so it's likely not related to nested
> > > > > virtualization?
> > > >
> > > > Yes, but on bare metal is so hard to reproduce (like 0.1% or even
> less
> > > > startups), I'm not really sure if that was the same problem, as the
> > > > problem doesn't leave that much logs...
> > >
> > > I'm not very familiar with netback, but I think it's indeed possible
> > > for netback to switch to state 2 without having created the vif.
> > > Netback switching from state 1 -> 2 seems to be solely controlled by
> > > the frontend state (see frontend_changed).
> > >
> > > I think the patch below could solve this issue, but I haven't even
> > > compile tested it, could you give it a spin?
> > >
> > > I would also like to hear the opinion of netback maintainers, since I
> > > might be completely wrong.
> >
> > IIRC there is a good reason why netback doesn't want the hotplug script
> to run before moving into state 2... the script adds the vif to the bridge
> and, if this is done on the 1 -> 2 transition then you may end up with a
> load of vifs sat on the bridge for which there is no frontend (at least
> yet, but maybe never)... so the bridge wastes time in every packet sent to
> such a vif.
> 
> I don't think netback has ever waited for a frontend before running
> hotplug scripts.
> 
> In the udev times the hotplug script would be run upon vif creation,
> which happens in netback_probe, and when launching hotplug scripts
> from libxl the script is executed when the backend changes to state 2,
> which happens almost immediately because netback switches to state 2
> when the frotnend is in state 1 which is the initial frontend state.
> 

I suspect I must be remembering a XenServer-specific hack^Wpatch then. I'd have 
to dig... it's been a while since I messed with the netif state model, which is 
of course different the blkif state model.

  Paul

> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.