[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback Oops then xenwatch stuck in D state



On Wed, 2013-02-13 at 19:20 +0000, David Vrabel wrote:
> On 13/02/13 18:37, Wei Liu wrote:
> > A slightly upgraded version of the *UNTESTED* patch.
> > 
> > 
> > Wei.
> > 
> > ----8<----
> > commit df4c929d034cec7043fbd96ba89833eb639c336e
> > Author: Wei Liu <wei.liu2@xxxxxxxxxx>
> > Date:   Wed Feb 13 18:17:01 2013 +0000
> > 
> >     netback: fix netbk_count_requests
> >     
> >     There are two paths in the original code, a) test against work_to_do, 
> > b) test
> >     against first->size, could return 0 even when error happens.
> >     
> >     Simply return -1 in error paths should work. Modify all error paths to 
> > return
> >     -1 to be consistent.
> 
> You also need to remove the netbk_tx_err() after checking the result of
> netbk_count_requests().  Otherwise you will have a double xenvif_put(),
> which will screw up ref counting.
> 

Yes I saw that as well. I was suspecting it was done on purpose. I
didn't write this patch anyway. I thought that Ian at least smoke-tested
it with creation / teardown vif so I just left it like that.

> I would also suggest returning -EINVAL from netbk_count_requests().
> It not clear to me how this will fix the original oops though.
> 

My analysis:

netbk_count_requests returns 0 when an error occurs in the first
iteration (frag = 0, -frag = 0), the caller gets 0 and doesn't notice
this vif has been disconnected. The subsequent call comparison
txreq.size < ETH_HLEN is true for some reason - frontend messes up the
txreq (this could also be the same reason that netbk_count_requests
fails in first iteration), and a subsequent call to netbk_tx_err
triggers the bug.


Wei.

> David
> 
> >     
> >     Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
> > 
> > diff --git a/drivers/net/xen-netback/netback.c 
> > b/drivers/net/xen-netback/netback.c
> > index 103294d..0e0162e 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -913,13 +913,13 @@ static int netbk_count_requests(struct xenvif *vif,
> >             if (frags >= work_to_do) {
> >                     netdev_err(vif->dev, "Need more frags\n");
> >                     netbk_fatal_tx_err(vif);
> > -                   return -frags;
> > +                   return -1;
> >             }
> >  
> >             if (unlikely(frags >= MAX_SKB_FRAGS)) {
> >                     netdev_err(vif->dev, "Too many frags\n");
> >                     netbk_fatal_tx_err(vif);
> > -                   return -frags;
> > +                   return -1;
> >             }
> >  
> >             memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
> > @@ -927,7 +927,7 @@ static int netbk_count_requests(struct xenvif *vif,
> >             if (txp->size > first->size) {
> >                     netdev_err(vif->dev, "Frag is bigger than frame.\n");
> >                     netbk_fatal_tx_err(vif);
> > -                   return -frags;
> > +                   return -1;
> >             }
> >  
> >             first->size -= txp->size;
> > @@ -937,7 +937,7 @@ static int netbk_count_requests(struct xenvif *vif,
> >                     netdev_err(vif->dev, "txp->offset: %x, size: %u\n",
> >                              txp->offset, txp->size);
> >                     netbk_fatal_tx_err(vif);
> > -                   return -frags;
> > +                   return -1;
> >             }
> >     } while ((txp++)->flags & XEN_NETTXF_more_data);
> >     return frags;
> > 
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > http://lists.xen.org/xen-devel
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.