[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"
On Wed, Mar 19, 2014 at 12:11:04AM +0100, Sander Eikelenboom wrote: [...] > > >>> Before Paul's change, we always reserve very large room for an incoming > >>> SKB. After Paul's change, we only reserve just enough room. Probably > >>> some extra room prevents this bug being triggered. > > >> [ 599.970745] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req > >> npo->meta_prod:37 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506387 > >> vif->rx.sring->req_event:504174 > >> [ 599.972487] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue ! > >> min_slots_needed:1 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506387 > >> vif->rx.sring->req_event:506388 > >> [ 600.044322] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req > >> npo->meta_prod:37 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 > >> req->gref:165543936 req->id:19 vif->rx.sring->req_event:506388 > >> [ 600.081167] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here 2 > >> npo->meta_prod:38 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 > >> npo->copy_gref:165543936 npo->copy_off:0 MAX_BUFFER_OFFSET:4096 > >> bytes:1168 size:1168 i:6 vif->rx.sring->req_event:506388 > >> estimated_slots_needed:8 > >> [ 600.118268] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here end > >> npo->meta_prod:38 vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 > >> npo->copy_gref:165543936 npo->copy_off:1168 MAX_BUFFER_OFFSET:4096 > >> bytes:1168 size:0 i:7 vif->rx.sring->req_event:506388 > >> estimated_slots_needed:8 > >> [ 600.155378] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 4 > >> npo->meta_prod:38 old_meta_prod:30 vif->rx.sring->req_prod:506387 > >> vif->rx.req_cons:506388 gso_type:1 gso_size:1448 nr_frags:1 req->gref:570 > >> req->id:11 estimated_slots_needed:8 i(frag): 0 > >> [ 600.192438] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 5 > >> npo->meta_prod:38 old_meta_prod:30 vif->rx.sring->req_prod:506387 > >> vif->rx.req_cons:506388 gso_type:1 gso_size:1448 nr_frags:1 req->gref:570 > >> req->id:11 estimated_slots_needed:8 > >> [ 600.229395] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 2 .. > >> vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 > >> sco->meta_slots_used:8 max_upped_gso:1 skb_is_gso(skb):1 > >> max_slots_needed:8 j:3 is_gso:1 nr_frags:1 firstpart:1 secondpart:6 > >> min_slots_needed:3 > >> [ 600.266518] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 1 .. > >> vif->rx.sring->req_prod:506387 vif->rx.req_cons:506388 > >> max_upped_gso:1 skb_is_gso(skb):0 max_slots_needed:1 j:4 is_gso:0 > >> nr_frags:0 firstpart:1 secondpart:0 min_slots_needed:1 > > >> It seems to estimate 8 slots and need 8 slots ... however .. shouldn't the > >> queue have been stopped before getting here .. > > > *hrmm please just ignore* .. got to get some sleep i guess .. > > Just went the empirical way .. and unconditionally upped the calculated > "max_slots_needed" by one in "xenvif_rx_action" just before checking the > "xenvif_rx_ring_slots_available", > this has prevented all non-fatal and fatal occurrences of "cons > prod" so > far. I will leave my tests running overnight, see if it survives the pounding. > > >From other net drivers i see different calculations .. seems it is kind of > >voodoo to determine the value .. one of which > >(drivers/net/ethernet/marvell/sky2.c) > seems to unconditionally reserve a slot for both GSO and CKSUM. Don't know if > that makes any sense though: > > /* This is the worst case number of transmit list elements for a single skb: > VLAN:GSO + CKSUM + Data + skb_frags * DMA */ > #define MAX_SKB_TX_LE (2 + > (sizeof(dma_addr_t)/sizeof(u32))*(MAX_SKB_FRAGS+1)) > Xen network "wire" format doesn't have a CKSUM metadata type though. > > > >>> Wei. > > >>>> > >>>> >> The second time it does get to the code after the RING_GET_REQUEST in > >>>> >> 'get_next_rx_buffer' and that leads to mayhem ... > >>>> >> > >>>> >> So added a netdev_warn to xenvif_start_xmit for the "stop queue" case > >>>> >> .. unfortunately it now triggers net_ratelimit at the end: > >>>> >> > >>>> >> [ 402.909693] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping > >>>> >> queue ! min_slots_needed:7 vif->rx.sring->req_prod:189228 > >>>> >> vif->rx.req_cons:189222 > >>>> > >>>> > I think xenvif_rx_ring_slots_available is doing its job. If req_prod - > >>>> > req_cons < needed, the queue is stopeed. > > >> So it seems .. most of the time .. but if i look at the calculation of > >> "min_slots_needed" in this function it seems completely different from the > >> one in > >> xenvif_rx_action for max_slots_needed .. though both seem to be used for > >> the same thing .. to calcultate how many slots the brokendown SKB would > >> need to fit in .. > >> So i added the calculation method from xenvif_start_xmit to > >> xenvif_rx_action .. in the error case you see min_slots_needed was 3 .. > >> but max_slots_needed and max_slots_used both were 8. > Those are different estimations. max_slots_needed estimates the worst case while min_slots_needed estimates the best case. When min_slots_needed is met, netback puts the SKB into its internal queue. xenvif_rx_action then will dequeue that packet, check max_slots_needed, if met, break SKB down. What I would expect is, even if they yield different values, checking the ring availability is enough before breaking SKB down. In your above case, max_slots_needed was met and SKB broken down. And as you said in your empirical test, bumping max_slots_needed by 1 prevented issues, then we might have problem calculating max_slots_needed. If you can work out what went wrong that can be very useful. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |