[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Netback vif reference count mismatching in latest 3.11 kernels



On Wed, Nov 27, 2013 at 03:07:09PM +0100, Tomasz Wroblewski wrote:
> Hi,
> 
> After update of our network backend vm kernel to 3.11.9 I'm seeing
> trouble with netback vif close which seem related to the recent
> changes which separated vif disconnect and free; It seems that now
> multiple disconnect/connect cycles can happen without freeing and
> reallocing the netdev in the processes, which confuses the vif
> refcount.
> 
> vif refcount is initialized to 1 in xenvif_alloc. Then first
> xenvif_disconnect brings it back to 0, instead of 1 which would seem
> more reasonable (since its initialized to 1 in xenvif_alloc i would
> expect it to not be dropped to 0 until xenvif_free). Second
> xenvif_disconnect brings it to -1 and hangs. For us (xenclient XT)
> this happens when we hibernate linux guest, since linux hibernate is
> a complex beast which transitions the drivers to between
> close/connected states multiple times (i.e. first it suspends/closes
> the drivers to take memory snapshot, then resumes/reconnects the
> drivers to the actual writing of hibernate image to disk, then
> finally it closes them again to shutdown the system)
> 

Can you illustrate a graph of the whole process? I'm not very clear of
the whole cycle.

There's a xenvif_get in xenvif_connect, which increases refcnt by 1,
that should corresponds to the atomic_dec in xenvif_disconnect, right?

> I've hacked the attached patch which fixes it (for us), is the approach taken 
> there correct/upstreamable/reasonable? It does the following
> 
>     * reset tx_irq to 0 after unbinding the irqs on disconnect -
> because xenvif_connect tests for it being 0 and will not reconnect
> if it's not reset
>     * reacquire one reference to vif in disconnect(). This is because the 
> reference
>       vif should be 1, as initialized in xenvif_alloc(), until the vif is 
> freed. Otherwise multiple disconne
>       and cause a hang. I imagine alternate way of fixing this could be to 
> use "0" as the default
>       refcnt in xenvif_alloc()
> 

You mean the numbers of connect's and disconnect's don't match? Even
after you reset tx_irq to 0?

Wei.

> I believe we didn't experience this issue on previous kernel because
> vif disconnect was also freeing the vif and netdev, hence it was not
> possible to get xenvif_connect/xenvif_disconnect called multiple
> times between vif alloc/free.
> 
> 
> 
> 
> 
> 

> diff --git a/drivers/net/xen-netback/interface.c 
> b/drivers/net/xen-netback/interface.c
> index 68d5102..ccb46c4 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -420,6 +420,8 @@ void xenvif_disconnect(struct xenvif *vif)
>  
>       atomic_dec(&vif->refcnt);
>       wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0);
> +     /* reacquire reference since it should be 1 until freed */
> +     xenvif_get(vif);
>  
>       if (vif->tx_irq) {
>               if (vif->tx_irq == vif->rx_irq)
> @@ -428,6 +430,7 @@ void xenvif_disconnect(struct xenvif *vif)
>                       unbind_from_irqhandler(vif->tx_irq, vif);
>                       unbind_from_irqhandler(vif->rx_irq, vif);
>               }
> +             vif->tx_irq = 0;
>       }
>  
>       xen_netbk_unmap_frontend_rings(vif);


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.