[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest-receive-side flow control
On Thu, Nov 28, 2013 at 01:11:12PM +0000, Paul Durrant wrote: > The flow control code relies on a double pass of tke skb, firstly to count > the number of ring slots needed to supply sufficient grant references, and > another to actually consume those references and build the grant copy > operations. It transpires that, in some scenarios, the initial count and the > number of references consumed differs and, after this happens a number of > times, flow control is completely broken. Please add blank lines between paragraphs in later versions. > This patch simplifies things by making a very quick static analysis of the > minimal number or ring slots needed for an skb and then only stopping the > queue if that minimal number of slots is not available. The worker thread > does a worst-case analysis and only processes what will definitely fit, > leaving remaining skbs on the internal queue. This patch also gets rid of the > drop that occurs if an skb won't fit (because that doesn't matter - we always > internally queue *at least* what we can put in the shared ring) and adds > some hysteresis to waking the external queue. (It only gets woken when at > least half the shared ring is available). I think this is the right direction to go. However we've tripped over this several times so is it possible to test it with XenRT? > Without this patch I can trivially stall netback permanently by just doing > a large guest to guest file copy between two Windows Server 2008R2 VMs on a > single host. > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx> > Cc: Wei Liu <wei.liu2@xxxxxxxxxx> > Cc: Ian Campbell <ian.campbell@xxxxxxxxxx> > Cc: David Vrabel <david.vrabel@xxxxxxxxxx> > --- > drivers/net/xen-netback/common.h | 26 ++--- > drivers/net/xen-netback/interface.c | 45 ++++---- > drivers/net/xen-netback/netback.c | 202 > ++++++++++------------------------- > 3 files changed, 87 insertions(+), 186 deletions(-) > > diff --git a/drivers/net/xen-netback/common.h > b/drivers/net/xen-netback/common.h > index 08ae01b..d7888db 100644 > --- a/drivers/net/xen-netback/common.h > +++ b/drivers/net/xen-netback/common.h > @@ -136,12 +136,7 @@ struct xenvif { > char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */ > struct xen_netif_rx_back_ring rx; > struct sk_buff_head rx_queue; > - > - /* Allow xenvif_start_xmit() to peek ahead in the rx request > - * ring. This is a prediction of what rx_req_cons will be > - * once all queued skbs are put on the ring. > - */ > - RING_IDX rx_req_cons_peek; > + bool rx_event; > What is this for? Reading through the code my impression is that this is a flag indicating RX interrupt is triggered. Why is it useful? Why is skb_queue_empty not sufficient? > /* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each > * head/fragment page uses 2 copy operations because it > @@ -198,8 +193,6 @@ void xenvif_xenbus_fini(void); > [...] > > -void xenvif_notify_tx_completion(struct xenvif *vif) > -{ > - if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif)) > - netif_wake_queue(vif->dev); > -} > - > static struct net_device_stats *xenvif_get_stats(struct net_device *dev) > { > struct xenvif *vif = netdev_priv(dev); > @@ -378,6 +378,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long > tx_ring_ref, > if (err < 0) > goto err; > > + init_waitqueue_head(&vif->wq); > + I guess the reason behind this code movement is that, if we don't do this netback will crash if we get interrupt before waitqueue is set up? > if (tx_evtchn == rx_evtchn) { > /* feature-split-event-channels == 0 */ > err = bind_interdomain_evtchn_to_irqhandler( [...] > @@ -583,23 +477,36 @@ void xenvif_rx_action(struct xenvif *vif) > > skb_queue_head_init(&rxq); > > - count = 0; > - > while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) { > - vif = netdev_priv(skb->dev); > - nr_frags = skb_shinfo(skb)->nr_frags; > + int max_slots_needed; > + int i; > > - sco = (struct skb_cb_overlay *)skb->cb; > - sco->meta_slots_used = xenvif_gop_skb(skb, &npo); > + /* We need a cheap worse case estimate for the number of > + * slots we'll use. > + */ > > - count += nr_frags + 1; > + max_slots_needed = 2; I'm afraid this starting point is not correct. Consider you have a SKB with very large linear buffer, you might need more than 2 slots to fit that in, right? > + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > + struct page *page; > > - __skb_queue_tail(&rxq, skb); > + page = skb_frag_page(&skb_shinfo(skb)->frags[i]); > + max_slots_needed += 1 << compound_order(page); This estimation might reserve slots more than necessary. The removal of xenvif_count_skb_slots() is OK because you're now estimating minimum slots, but the actual number of slots consumed needs to be acurate. Any reason you don't want to use xenvif_gop_skb()? > + } > + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4 || > + skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) > + max_slots_needed++; > > - /* Filled the batch queue? */ > - /* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */ > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > + /* If the skb may not fit then bail out now */ > + if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) { > + skb_queue_head(&vif->rx_queue, skb); > break; > + } > + > + sco = (struct skb_cb_overlay *)skb->cb; > + sco->meta_slots_used = xenvif_gop_skb(skb, &npo); > + BUG_ON(sco->meta_slots_used > max_slots_needed); > + > + __skb_queue_tail(&rxq, skb); > } > [...] > + > int xenvif_kthread(void *data) > { > struct xenvif *vif = data; > + const int threshold = XEN_NETIF_RX_RING_SIZE / 2; > > while (!kthread_should_stop()) { > + Stray blank line. Wei. > wait_event_interruptible(vif->wq, > rx_work_todo(vif) || > kthread_should_stop()); > if (kthread_should_stop()) > break; > > - if (rx_work_todo(vif)) > + if (!skb_queue_empty(&vif->rx_queue)) > xenvif_rx_action(vif); > > + vif->rx_event = false; > + if (xenvif_rx_ring_slots_available(vif, threshold)) > + xenvif_start_queue(vif); > + > cond_resched(); > } > > -- > 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |