[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"



On Sat, Mar 22, 2014 at 07:28:34PM +0100, Sander Eikelenboom wrote:
[...]
> > Yes there is only one frag .. but it seems to be much larger than PAGE_SIZE 
> > .. and xenvif_gop_frag_copy brakes that frag down into smaller bits .. 
> > hence the calculation in xenvif_rx_action determining the slots needed by 
> > doing:
> 
> >                 for (i = 0; i < nr_frags; i++) {
> >                         unsigned int size;
> >                         size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
> >                         max_slots_needed += DIV_ROUND_UP(size, PAGE_SIZE);
> >                 }
> 
> > But the code in xenvif_gop_frag_copy .. seems to be needing one more slot 
> > (from the emperical test) .. and calling "get_next_rx_buffer" seems 
> > involved in that ..
> 
> Hmm looked again .. and it seems this is it .. when your frags are large 
> enough you have the chance of running into this.
> 

get_next_rx_buffer is guarded by start_new_rx_buffer. Do you see any
problem with that implementation?

> Problem is .. i don't see an easy fix, the "one more slot" of the empirical 
> test doesn't seem to be the worst case either (i think):
> 
> - In my case the packets that hit this only have 1 frag, but i could have had 
> more frags.
>   I also think you can't rule out the possibility of doing the 
> "get_next_rx_buffer" for multiple subsequent frags from one packet,
>   so in the worst (and perhaps even from a single frag since it's looped over 
> a split of it in what seems PAGE_SIZE pieces.)
>   - So an exact calculation of how much slots we are going to need for 
> hitting this "get_next_rx_buffer"  upfront in "xenvif_rx_action" seems 
> unfeasible.
>   - A worst case gamble seems impossible either .. if you take multiple frags 
> * multiple times the "get_next_rx_buffer" ... you would probably be back at 
> just
>     setting the needed_slots to MAX_SKB_FRAGS.
> 
> - Other thing would be checking for the available slots before doing the  
> "get_next_rx_buffer" .. how ever .. we don't account for how many slots we 
> still need to
>   just process the remaining frags.
> 

We've done a worst case estimation for whole SKB (linear area + all
frags) already, at the very first beginning. That's what
max_slots_needed is for.

> - Just remove the whole "get_next_rx_buffer".. just tested it but it seems 
> the "get_next_rx_buffer" is necessary ..  when i make start_new_rx_buffer 
> always return false
>   i hit the bug below :S
> 
> So the questions are ... is the "get_next_rx_buffer" part really necessary ?
>    - if not, what is the benefit of the "get_next_rx_buffer" and does that 
> outweigh the additional cost of accounting
>      of needed_slots for the frags that have yet to be processed ?
>    - if yes, erhmmm what would be the best acceptable solution .. returning 
> to using MAX_SKB_FRAGS ?
> 

I think you need to answer several questions first:
1. is max_slots_needed actually large enough to cover whole SKB?
2. is the test of ring slot availability acurate?
3. is the number of ring slots consumed larger than max_slots_needed? (I
   guess the answer is yes)
4. which step in the break-down procedure causes backend to overrun the
   ring?

It doesn't matter how many frags your SKB has and how big a frag is. If
every step is correct then you're fine. The code you're looking at
(start_new_rx_buffer / get_next_rx_buffer and friend) needs to be
carefully examined.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.