[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"
[...] > > Those are different estimations. max_slots_needed estimates the worst > > case while min_slots_needed estimates the best case. When > > min_slots_needed is met, netback puts the SKB into its internal queue. > > xenvif_rx_action then will dequeue that packet, check max_slots_needed, > > if met, break SKB down. > > I realised that, though if you don't count the cost of calculation .. the > best would be > to have min_slots and max_slots the same .. and being correct. But cost of > the calculation count .. > on the other hand min_slots seems to be far too low in this case. Is the > nr_frags the best thing to check for ? > Wouldn't something like skbsize / PAGE_SIZE be a better estimate ? > It doesn't matter. It won't cause breakage at that point. > > > What I would expect is, even if they yield different values, checking > > the ring availability is enough before breaking SKB down. > > > In your above case, max_slots_needed was met and SKB broken down. And as > > you said in your empirical test, bumping max_slots_needed by 1 prevented > > issues, then we might have problem calculating max_slots_needed. If you > > can work out what went wrong that can be very useful. > > OK ran the whole night with "max_slots_needed+1" and i saw no triggering of > both: > - the non-fatal oddity (only a temporary cons > prod ..) > - the fatal case (leading to the grantref error) > > So it seems we are off by one .. somewhere > > > At first the non-fatal oddity .. this one i really can't grasp: > > [ 567.880282] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:562972 vif->rx.req_cons:562969 > vif->rx.sring->req_event:562973 > [ 569.893801] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:571188 vif->rx.req_cons:571187 > vif->rx.sring->req_event:571189 > [ 571.892419] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:577141 vif->rx.req_cons:577138 > vif->rx.sring->req_event:577142 > [ 575.451383] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req > npo->meta_prod:6 vif->rx.sring->req_prod:590603 vif->rx.req_cons:590603 > vif->rx.sring->req_event:589615, reserved_slots_left:0 > [ 575.487721] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req > npo->meta_prod:6 vif->rx.sring->req_prod:590660 vif->rx.req_cons:590604 > req->gref:569 req->id:11 vif->rx.sring->req_event:589615 > reserved_slots_left:-1 > [ 575.524427] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:590660 vif->rx.req_cons:590660 > vif->rx.sring->req_event:590661 > [ 576.508063] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req > npo->meta_prod:16 vif->rx.sring->req_prod:594343 vif->rx.req_cons:594343 > vif->rx.sring->req_event:591619, reserved_slots_left:0 > [ 576.544708] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req > npo->meta_prod:16 vif->rx.sring->req_prod:594389 vif->rx.req_cons:594344 > req->gref:668 req->id:167 vif->rx.sring->req_event:591619 > reserved_slots_left:-1 > [ 576.581595] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:594389 vif->rx.req_cons:594389 > vif->rx.sring->req_event:594390 > [ 577.325826] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:2 vif->rx.sring->req_prod:597248 vif->rx.req_cons:597247 > vif->rx.sring->req_event:597249 > [ 577.576973] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:598146 vif->rx.req_cons:598143 > vif->rx.sring->req_event:598147 > > What i don't get is: > - i do warn just before the RING_GET_REQUEST on > vif->rx.sring->req_prod:590603 vif->rx.req_cons:590603 .. so cons == prod > - i warn because req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); is > going to add one to cons .. leading to a situation that should not exist .. > - however in the warn just after this RING_GET_REQUEST .. prod is somehow > increased a lot .. probably the "next_rx_buffer" aspect of things .. so cons > is much smaller than prod again .. Frontend is also accessing the ring. So it's normal that some index changed. > - How everi don't see anything in the RING_GET_REQUEST macro fiddling with > vif->rx.sring->req_prod .. > - so who has been fiddling with vif->rx.sring->req_prod in the mean time .. > my very limited c skills fail to see how this works :-) > > > Then the fatal case ... > > [ 1108.566486] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:1 vif->rx.sring->req_prod:1843278 vif->rx.req_cons:1843278 > vif->rx.sring->req_event:1843279 > [ 1109.775757] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:1847779 vif->rx.req_cons:1847776 > vif->rx.sring->req_event:1847780 > [ 1110.041661] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:1848795 vif->rx.req_cons:1848794 > vif->rx.sring->req_event:1848796 > [ 1110.301778] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:4 vif->rx.sring->req_prod:1849755 vif->rx.req_cons:1849754 > vif->rx.sring->req_event:1849756 > [ 1111.565412] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req > npo->meta_prod:35 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854291 > vif->rx.sring->req_event:1853482, reserved_slots_left:0 > [ 1111.565973] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > min_slots_needed:1 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854291 > vif->rx.sring->req_event:1854292 > [ 1111.638425] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req > npo->meta_prod:35 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854292 > req->gref:4325377 req->id:83 vif->rx.sring->req_event:1854292 > reserved_slots_left:-1 > [ 1111.675042] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here 2 > npo->meta_prod:36 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854292 > npo->copy_gref:4325377 npo->copy_off:0 MAX_BUFFER_OFFSET:4096 bytes:1168 > size:1168 i:5 vif->rx.sring->req_event:1854292 estimated_slots_needed:7 > reserved_slots_left:-1 > [ 1111.711833] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here end > npo->meta_prod:36 vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854292 > npo->copy_gref:4325377 npo->copy_off:1168 MAX_BUFFER_OFFSET:4096 bytes:1168 > size:0 i:6 vif->rx.sring->req_event:1854292 gso_gaps:0 > estimated_slots_needed:7 reserved_slots_left:-1 > [ 1111.766836] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 4 after > npo->meta_prod:36 old_meta_prod:29 vif->rx.sring->req_prod:1854291 > vif->rx.req_cons:1854292 meta->gso_type:1 meta->gso_size:1448 nr_frags:1 > req->gref:674 req->id:76 estimated_slots_needed:7 i(frag):0 j(data):1 > reserved_slots_left:-1 > [ 1111.803702] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 5 > npo->meta_prod:36 old_meta_prod:29 vif->rx.sring->req_prod:1854291 > vif->rx.req_cons:1854292 meta->gso_type:1 meta->gso_size:1448 nr_frags:1 > req->gref:674 req->id:76 estimated_slots_needed:7 j(data):1 > reserved_slots_left:-1 used in funcstart: 0 + 1 .. used_dataloop:1 .. > used_fragloop:6 > [ 1111.858602] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 2 .. > vif->rx.sring->req_prod:1854291 vif->rx.req_cons:1854292 > sco->meta_slots_used:7 max_upped_gso:1 skb_is_gso(skb):1 max_slots_needed:7 > j:3 is_gso:1 nr_frags:1 firstpart:1 secondpart:5 reserved_slots_left:-1 > > Ok so if i try to pull this apart (from bottom to top): > > - it starts out with "xenvif_rx_action" and calculated we would need: > 1 slot for the DIV_ROUND_UP(offset_in_page(skb->data) + > skb_headlen(skb), PAGE_SIZE); > 5 slots for 1 frag > 1 slot for GSO > So 7 slots in total .. > > - it checks if there are enough slots available if > (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) and aparently there > are .. so it continues > > - We end up in "xenvif_gop_skb" processing .. and everything seems to go well > .. > - In the start of this function we do one unconditional > RING_GET_REQUEST .. and potentially one conditional (if gso) > - Question: in "xenvif_rx_action" we only reserved ONE slot for GSO > stuff .. so this could cost TWO slots .. is that right ? The slot reserved for GSO is for meta data, so 1 is enough. > - But in this case we only use one ... see "used in funcstart: 0 + > 1" > > - Now we loop 1 time through the "while (data < > skb_tail_pointer(skb))" loop ... since j(data)=1 > - and call "xenvif_gop_frag_copy" from inside this loop here .. > which potentially calls .. get_next_rx_buffer > - If i understand it correctly the number of slots this loop should > take at a maximum should be what we calculated before with the > DIV_ROUND_UP(offset_in_page(skb->data) + skb_headlen(skb), > PAGE_SIZE) ? (1 slot in this case) > - Nothing is logged yet because cons is still smaller than prod at > this time, but from the aftermath we see this was "used_dataloop:1" > - So that seems correct .. > > - Now we are at the loop through "nr_frags" and loop the 1 frag we > have .. > - This all goes well until the sixth iteration (i from > 0..5) .. and call "xenvif_gop_frag_copy" from inside this loop ... Isn't there only one frag? > - "xenvif_gop_frag_copy" calls "get_next_rx_buffer" .. cons > becomes larger than prod ..and the "miracle" from the non-fatal case of prod > suddenly becoming larger does not happen > > - So the calculation of how many slots are required for the frags in > "xenvif_rx_action" does not match wath the code in "xenvif_gop_frag_copy" > seems to need, > I don't see why .. it's rounding up .. should behave correct if the > left over happens to be 0 size .. > > > > > Wei. > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |