[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Netback vif reference count mismatching in latest 3.11 kernels
On Wed, Nov 27, 2013 at 05:21:56PM +0100, Tomasz Wroblewski wrote: > On 11/27/2013 03:41 PM, Wei Liu wrote: > >On Wed, Nov 27, 2013 at 03:07:09PM +0100, Tomasz Wroblewski wrote: > >>Hi, > >> > >>After update of our network backend vm kernel to 3.11.9 I'm seeing > >>trouble with netback vif close which seem related to the recent > >>changes which separated vif disconnect and free; It seems that now > >>multiple disconnect/connect cycles can happen without freeing and > >>reallocing the netdev in the processes, which confuses the vif > >>refcount. > >> > >>vif refcount is initialized to 1 in xenvif_alloc. Then first > >>xenvif_disconnect brings it back to 0, instead of 1 which would seem > >>more reasonable (since its initialized to 1 in xenvif_alloc i would > >>expect it to not be dropped to 0 until xenvif_free). Second > >>xenvif_disconnect brings it to -1 and hangs. For us (xenclient XT) > >>this happens when we hibernate linux guest, since linux hibernate is > >>a complex beast which transitions the drivers to between > >>close/connected states multiple times (i.e. first it suspends/closes > >>the drivers to take memory snapshot, then resumes/reconnects the > >>drivers to the actual writing of hibernate image to disk, then > >>finally it closes them again to shutdown the system) > >> > > > >Can you illustrate a graph of the whole process? I'm not very clear of > >the whole cycle. > > > >There's a xenvif_get in xenvif_connect, which increases refcnt by 1, > >that should corresponds to the atomic_dec in xenvif_disconnect, right? > > > > Not if I'm reading the code right. I think the atomic_dec in > xenvif_disconnect corresponds to atomic_set(..., 1) in xenvif_alloc, > and xenvif_get() in xenvif_connect corresponds to xenvif_put in > xenvif_carrier_off() (which is called at top of xenvif_disconnect). > > Therefore xenvif_disconnect() decrements the refcnt twice whilst > xenvif_connect() increments it once, which results in negative > refcnt after one cycle. The graph I'm seeing looks like this: > OK, I get what you mean. The separation of vif disconnect and free is indeed causing problem. I think the right fix would be to make things symmetric again. Does the following *untested* patch work for you? If so I will submit a proper patch to netdev. Wei. ---8<--- diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index d28324a..342d4e5 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -418,9 +418,6 @@ void xenvif_disconnect(struct xenvif *vif) if (netif_carrier_ok(vif->dev)) xenvif_carrier_off(vif); - atomic_dec(&vif->refcnt); - wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0); - if (vif->tx_irq) { if (vif->tx_irq == vif->rx_irq) unbind_from_irqhandler(vif->tx_irq, vif); @@ -428,6 +425,7 @@ void xenvif_disconnect(struct xenvif *vif) unbind_from_irqhandler(vif->tx_irq, vif); unbind_from_irqhandler(vif->rx_irq, vif); } + vif->tx_irq = 0; } xen_netbk_unmap_frontend_rings(vif); @@ -435,6 +433,9 @@ void xenvif_disconnect(struct xenvif *vif) void xenvif_free(struct xenvif *vif) { + atomic_dec(&vif->refcnt); + wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0); + unregister_netdev(vif->dev); free_netdev(vif->dev); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |