[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Netback vif reference count mismatching in latest 3.11 kernels
On Wed, Nov 27, 2013 at 03:07:09PM +0100, Tomasz Wroblewski wrote: > Hi, > > After update of our network backend vm kernel to 3.11.9 I'm seeing > trouble with netback vif close which seem related to the recent > changes which separated vif disconnect and free; It seems that now > multiple disconnect/connect cycles can happen without freeing and > reallocing the netdev in the processes, which confuses the vif > refcount. > > vif refcount is initialized to 1 in xenvif_alloc. Then first > xenvif_disconnect brings it back to 0, instead of 1 which would seem > more reasonable (since its initialized to 1 in xenvif_alloc i would > expect it to not be dropped to 0 until xenvif_free). Second > xenvif_disconnect brings it to -1 and hangs. For us (xenclient XT) > this happens when we hibernate linux guest, since linux hibernate is > a complex beast which transitions the drivers to between > close/connected states multiple times (i.e. first it suspends/closes > the drivers to take memory snapshot, then resumes/reconnects the > drivers to the actual writing of hibernate image to disk, then > finally it closes them again to shutdown the system) > Can you illustrate a graph of the whole process? I'm not very clear of the whole cycle. There's a xenvif_get in xenvif_connect, which increases refcnt by 1, that should corresponds to the atomic_dec in xenvif_disconnect, right? > I've hacked the attached patch which fixes it (for us), is the approach taken > there correct/upstreamable/reasonable? It does the following > > * reset tx_irq to 0 after unbinding the irqs on disconnect - > because xenvif_connect tests for it being 0 and will not reconnect > if it's not reset > * reacquire one reference to vif in disconnect(). This is because the > reference > vif should be 1, as initialized in xenvif_alloc(), until the vif is > freed. Otherwise multiple disconne > and cause a hang. I imagine alternate way of fixing this could be to > use "0" as the default > refcnt in xenvif_alloc() > You mean the numbers of connect's and disconnect's don't match? Even after you reset tx_irq to 0? Wei. > I believe we didn't experience this issue on previous kernel because > vif disconnect was also freeing the vif and netdev, hence it was not > possible to get xenvif_connect/xenvif_disconnect called multiple > times between vif alloc/free. > > > > > > > diff --git a/drivers/net/xen-netback/interface.c > b/drivers/net/xen-netback/interface.c > index 68d5102..ccb46c4 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -420,6 +420,8 @@ void xenvif_disconnect(struct xenvif *vif) > > atomic_dec(&vif->refcnt); > wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0); > + /* reacquire reference since it should be 1 until freed */ > + xenvif_get(vif); > > if (vif->tx_irq) { > if (vif->tx_irq == vif->rx_irq) > @@ -428,6 +430,7 @@ void xenvif_disconnect(struct xenvif *vif) > unbind_from_irqhandler(vif->tx_irq, vif); > unbind_from_irqhandler(vif->rx_irq, vif); > } > + vif->tx_irq = 0; > } > > xen_netbk_unmap_frontend_rings(vif); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |