[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Netback vif reference count mismatching in latest 3.11 kernels

On Wed, Nov 27, 2013 at 03:07:09PM +0100, Tomasz Wroblewski wrote:
> Hi,
> After update of our network backend vm kernel to 3.11.9 I'm seeing
> trouble with netback vif close which seem related to the recent
> changes which separated vif disconnect and free; It seems that now
> multiple disconnect/connect cycles can happen without freeing and
> reallocing the netdev in the processes, which confuses the vif
> refcount.
> vif refcount is initialized to 1 in xenvif_alloc. Then first
> xenvif_disconnect brings it back to 0, instead of 1 which would seem
> more reasonable (since its initialized to 1 in xenvif_alloc i would
> expect it to not be dropped to 0 until xenvif_free). Second
> xenvif_disconnect brings it to -1 and hangs. For us (xenclient XT)
> this happens when we hibernate linux guest, since linux hibernate is
> a complex beast which transitions the drivers to between
> close/connected states multiple times (i.e. first it suspends/closes
> the drivers to take memory snapshot, then resumes/reconnects the
> drivers to the actual writing of hibernate image to disk, then
> finally it closes them again to shutdown the system)
> I've hacked the attached patch which fixes it (for us), is the approach taken 
> there correct/upstreamable/reasonable? It does the following
>     * reset tx_irq to 0 after unbinding the irqs on disconnect -
> because xenvif_connect tests for it being 0 and will not reconnect
> if it's not reset
>     * reacquire one reference to vif in disconnect(). This is because the 
> reference
>       vif should be 1, as initialized in xenvif_alloc(), until the vif is 
> freed. Otherwise multiple disconne
>       and cause a hang. I imagine alternate way of fixing this could be to 
> use "0" as the default
>       refcnt in xenvif_alloc()
> I believe we didn't experience this issue on previous kernel because
> vif disconnect was also freeing the vif and netdev, hence it was not
> possible to get xenvif_connect/xenvif_disconnect called multiple
> times between vif alloc/free.

For the record this bug only manifests in stable trees prior 3.12. A
patch for 1:1 model went in 3.12 and removed a bunch of ref counting
stuffs. On the other hand stable trees picked up Paul's state machine
patch which separated disconnect and free functions, not handling those
ref counting bits correctly.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.