[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"



[...]
> > Those are different estimations. max_slots_needed estimates the worst
> > case while min_slots_needed estimates the best case. When
> > min_slots_needed is met, netback puts the SKB into its internal queue.
> > xenvif_rx_action then will dequeue that packet, check max_slots_needed,
> > if met, break SKB down.
> 
> I realised that, though if you don't count the cost of calculation .. the 
> best would be
> to have min_slots and max_slots the same .. and being correct. But cost of 
> the calculation count ..
> on the other hand min_slots seems to be far too low in this case. Is the 
> nr_frags the best thing to check for ?
> Wouldn't something like skbsize / PAGE_SIZE be a better estimate ?
> 

It doesn't matter. It won't cause breakage at that point.

> 
> > What I would expect is, even if they yield different values, checking
> > the ring availability is enough before breaking SKB down.
> 
> > In your above case, max_slots_needed was met and SKB broken down. And as
> > you said in your empirical test, bumping max_slots_needed by 1 prevented
> > issues, then we might have problem calculating max_slots_needed. If you
> > can work out what went wrong that can be very useful.
> 
> OK ran the whole night with "max_slots_needed+1" and i saw no triggering of 
> both:
>     - the non-fatal oddity (only a temporary cons > prod ..)
>     - the fatal case (leading to the grantref error)
> 
> So it seems we are off by one .. somewhere
> 
> 
> At first the non-fatal oddity .. this one i really can't grasp:
> 
> [  567.880282] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:562972 vif->rx.req_cons:562969 
> vif->rx.sring->req_event:562973
> [  569.893801] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:571188 vif->rx.req_cons:571187 
> vif->rx.sring->req_event:571189
> [  571.892419] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:577141 vif->rx.req_cons:577138 
> vif->rx.sring->req_event:577142
> [  575.451383] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req 
> npo->meta_prod:6 vif->rx.sring->req_prod:590603 vif->rx.req_cons:590603 
> vif->rx.sring->req_event:589615, reserved_slots_left:0
> [  575.487721] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req 
> npo->meta_prod:6 vif->rx.sring->req_prod:590660 vif->rx.req_cons:590604 
> req->gref:569 req->id:11 vif->rx.sring->req_event:589615 
> reserved_slots_left:-1
> [  575.524427] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:590660 vif->rx.req_cons:590660 
> vif->rx.sring->req_event:590661
> [  576.508063] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req 
> npo->meta_prod:16 vif->rx.sring->req_prod:594343 vif->rx.req_cons:594343 
> vif->rx.sring->req_event:591619, reserved_slots_left:0
> [  576.544708] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req 
> npo->meta_prod:16 vif->rx.sring->req_prod:594389 vif->rx.req_cons:594344 
> req->gref:668 req->id:167 vif->rx.sring->req_event:591619 
> reserved_slots_left:-1
> [  576.581595] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:594389 vif->rx.req_cons:594389 
> vif->rx.sring->req_event:594390
> [  577.325826] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:2 vif->rx.sring->req_prod:597248 vif->rx.req_cons:597247 
> vif->rx.sring->req_event:597249
> [  577.576973] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:598146 vif->rx.req_cons:598143 
> vif->rx.sring->req_event:598147
> 
> What i don't get is:
> - i do warn just before the RING_GET_REQUEST on 
> vif->rx.sring->req_prod:590603 vif->rx.req_cons:590603 .. so cons == prod
> - i warn because  req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); is 
> going to add one to cons .. leading to a situation that should not exist ..
> - however in the warn just after this RING_GET_REQUEST .. prod is somehow 
> increased a lot .. probably the "next_rx_buffer" aspect of things .. so cons 
> is much smaller than prod again ..

Frontend is also accessing the ring. So it's normal that some index
changed.

> - How everi don't see anything in the RING_GET_REQUEST macro fiddling with 
> vif->rx.sring->req_prod ..
> - so who has been fiddling with vif->rx.sring->req_prod in the mean time .. 
> my very limited c skills fail to see how this works :-)
> 
> 
> Then the fatal case ...
> 
> [ 1108.566486] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:1 vif->rx.sring->req_prod:1843278 vif->rx.req_cons:1843278 
> vif->rx.sring->req_event:1843279
> [ 1109.775757] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:1847779 vif->rx.req_cons:1847776 
> vif->rx.sring->req_event:1847780
> [ 1110.041661] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:1848795 vif->rx.req_cons:1848794 
> vif->rx.sring->req_event:1848796
> [ 1110.301778] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:4 vif->rx.sring->req_prod:1849755 vif->rx.req_cons:1849754 
> vif->rx.sring->req_event:1849756
> [ 1111.565412] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req 
> npo->meta_prod:35 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854291 
> vif->rx.sring->req_event:1853482, reserved_slots_left:0
> [ 1111.565973] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  
> min_slots_needed:1 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854291 
> vif->rx.sring->req_event:1854292
> [ 1111.638425] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req 
> npo->meta_prod:35 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854292 
> req->gref:4325377 req->id:83 vif->rx.sring->req_event:1854292 
> reserved_slots_left:-1
> [ 1111.675042] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here 2  
> npo->meta_prod:36 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854292 
> npo->copy_gref:4325377  npo->copy_off:0  MAX_BUFFER_OFFSET:4096 bytes:1168 
> size:1168 i:5 vif->rx.sring->req_event:1854292 estimated_slots_needed:7 
> reserved_slots_left:-1
> [ 1111.711833] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here end 
> npo->meta_prod:36 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854292 
> npo->copy_gref:4325377 npo->copy_off:1168  MAX_BUFFER_OFFSET:4096 bytes:1168 
> size:0 i:6 vif->rx.sring->req_event:1854292 gso_gaps:0 
> estimated_slots_needed:7 reserved_slots_left:-1
> [ 1111.766836] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 4 after 
> npo->meta_prod:36 old_meta_prod:29 vif->rx.sring->req_prod:1854291 
> vif->rx.req_cons:1854292 meta->gso_type:1 meta->gso_size:1448 nr_frags:1 
> req->gref:674 req->id:76 estimated_slots_needed:7 i(frag):0 j(data):1 
> reserved_slots_left:-1
> [ 1111.803702] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 5 
> npo->meta_prod:36 old_meta_prod:29 vif->rx.sring->req_prod:1854291 
> vif->rx.req_cons:1854292 meta->gso_type:1 meta->gso_size:1448 nr_frags:1 
> req->gref:674 req->id:76 estimated_slots_needed:7 j(data):1 
> reserved_slots_left:-1    used in funcstart: 0 + 1 .. used_dataloop:1 .. 
> used_fragloop:6
> [ 1111.858602] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 2 ..  
> vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854292 
> sco->meta_slots_used:7 max_upped_gso:1 skb_is_gso(skb):1 max_slots_needed:7 
> j:3 is_gso:1 nr_frags:1 firstpart:1 secondpart:5 reserved_slots_left:-1
> 
> Ok so if i try to pull this apart (from bottom to top):
> 
> - it starts out with "xenvif_rx_action" and calculated we would need:
>       1 slot for the  DIV_ROUND_UP(offset_in_page(skb->data) + 
> skb_headlen(skb), PAGE_SIZE);
>       5 slots for 1 frag
>       1 slot for GSO
>       So 7 slots in total .. 
> 
> - it checks  if there are enough slots available if 
> (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) and aparently there 
> are .. so it continues
> 
> - We end up in "xenvif_gop_skb" processing .. and everything seems to go well 
> ..
>       - In the start of this function we do one unconditional 
> RING_GET_REQUEST .. and potentially one conditional (if gso)
>           - Question: in "xenvif_rx_action" we only reserved ONE slot for GSO 
> stuff .. so this could cost TWO slots .. is that right ?

The slot reserved for GSO is for meta data, so 1 is enough.

>           - But in this case we only use one ... see "used in funcstart: 0 + 
> 1"
> 
>       - Now we loop 1 time through the  "while (data < 
> skb_tail_pointer(skb))" loop ... since j(data)=1
>           - and call "xenvif_gop_frag_copy" from inside this loop here .. 
> which potentially calls .. get_next_rx_buffer
>           - If i understand it correctly the number of slots this loop should 
> take at a maximum should be what we calculated before with the
>             DIV_ROUND_UP(offset_in_page(skb->data) + skb_headlen(skb), 
> PAGE_SIZE) ? (1 slot in this case)
>           - Nothing is logged yet because cons is still smaller than prod at 
> this time, but from the aftermath we see this was "used_dataloop:1"
>           - So that seems correct .. 
> 
>       - Now we are at the loop through "nr_frags"  and loop the 1 frag we 
> have .. 
>                   - This all goes well until the sixth iteration (i from 
> 0..5) .. and call "xenvif_gop_frag_copy" from inside this loop ...

Isn't there only one frag?

>                   - "xenvif_gop_frag_copy" calls "get_next_rx_buffer" .. cons 
> becomes larger than prod ..and the "miracle" from the non-fatal case of prod 
> suddenly becoming larger does not happen
> 
>       - So the calculation of how many slots are required for the frags in 
> "xenvif_rx_action" does not match wath the code in "xenvif_gop_frag_copy" 
> seems to need,
>         I don't see why .. it's rounding up .. should behave correct if the 
> left over happens to be 0 size ..
> 
> 
> 
> > Wei.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.