[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback Oops then xenwatch stuck in D state



>>> On 14.02.13 at 12:20, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
> On Thu, 2013-02-14 at 09:11 +0000, Jan Beulich wrote:
>> >>> On 13.02.13 at 21:17, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
>> > On Wed, Feb 13, 2013 at 07:20:32PM +0000, David Vrabel wrote:
>> >> On 13/02/13 18:37, Wei Liu wrote:
>> >> > A slightly upgraded version of the *UNTESTED* patch.
>> >> > 
>> >> > 
>> >> > Wei.
>> >> > 
>> >> > ----8<----
>> >> > commit df4c929d034cec7043fbd96ba89833eb639c336e
>> >> > Author: Wei Liu <wei.liu2@xxxxxxxxxx>
>> >> > Date:   Wed Feb 13 18:17:01 2013 +0000
>> >> > 
>> >> >     netback: fix netbk_count_requests
>> >> >     
>> >> >     There are two paths in the original code, a) test against 
>> >> > work_to_do, 
>> > b) test
>> >> >     against first->size, could return 0 even when error happens.
>> >> >     
>> >> >     Simply return -1 in error paths should work. Modify all error paths 
>> >> > to 
> 
>> > return
>> >> >     -1 to be consistent.
>> >> 
>> >> You also need to remove the netbk_tx_err() after checking the result of
>> >> netbk_count_requests().  Otherwise you will have a double xenvif_put(),
>> >> which will screw up ref counting.
>> > 
>> > I just realized that we were talking about different code path when I
>> > walked home.
>> > 
>> > The path you mentioned is correct. As excution flow should never reach
>> > there if there is error in netbk_count_requests.
>> > 
>> > The path I'm not sure is that in the netbk_fatal_tx_err, it calls
>> > xenvif_carrier_off which calls xenvif_put, and then it calls xenvif_put,
>> > which is likely to mess up the refcount.
>> 
>> I first thought so too when looking at this again yesterday, but
>> then convinced myself that this double put _here_ is correct. And
>> Ian's patch specifically removed to call to netbk_tx_err() after the
>> caller netbk_count_requests() recognized the error, knowing that
>> the latter function now itself issues a call to netbk_fatal_tx_err()
>> (whether that wouldn't better have replaced the caller's call to
>> netbk_tx_err() is a different question).
> 
> I'm not convinced. netbk_tx_err only has one xenvif_put, however
> netbk_fatal_tx_err has two puts.

Sure - one for the current transfer, and one for the carrier being
on. That second one would otherwise be put when the interface
gets brought down "normally".

> If this is a bug, and, if my previous patch fixes Christopher's OOPS, he
> will hit this bug soon when shutting down DomU.

I don't think this patch will fix his problems, which - as described
yesterday - I'm relatively certain result from the harsh action
netbk_fatal_tx_err() does.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.