Xen project Mailing List

Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest-receive-side flow control

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Thu, 28 Nov 2013 16:05:52 +0000

Accept-language: en-GB, en-US

Cc: David Vrabel <david.vrabel@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Thu, 28 Nov 2013 16:06:17 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHO7DtWowKDkq62TUCli3/g2Xv3mZo6uhCAgAASEuA=

Thread-topic: [PATCH net-next] xen-netback: improve guest-receive-side flow control

> -----Original Message----- > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx] > Sent: 28 November 2013 15:51 > To: Paul Durrant > Cc: xen-devel@xxxxxxxxxxxxx; Wei Liu; Ian Campbell; David Vrabel > Subject: Re: [PATCH net-next] xen-netback: improve guest-receive-side flow > control > > On Thu, Nov 28, 2013 at 01:11:12PM +0000, Paul Durrant wrote: > > The flow control code relies on a double pass of tke skb, firstly to count > > the number of ring slots needed to supply sufficient grant references, and > > another to actually consume those references and build the grant copy > > operations. It transpires that, in some scenarios, the initial count and the > > number of references consumed differs and, after this happens a number > of > > times, flow control is completely broken. > > Please add blank lines between paragraphs in later versions. > > > This patch simplifies things by making a very quick static analysis of the > > minimal number or ring slots needed for an skb and then only stopping the > > queue if that minimal number of slots is not available. The worker thread > > does a worst-case analysis and only processes what will definitely fit, > > leaving remaining skbs on the internal queue. This patch also gets rid of > > the > > drop that occurs if an skb won't fit (because that doesn't matter - we > always > > internally queue *at least* what we can put in the shared ring) and adds > > some hysteresis to waking the external queue. (It only gets woken when at > > least half the shared ring is available). > > I think this is the right direction to go. However we've tripped over > this several times so is it possible to test it with XenRT? > That's on its way. It'll be a day or so to get results back though. > > Without this patch I can trivially stall netback permanently by just doing > > a large guest to guest file copy between two Windows Server 2008R2 VMs > on a > > single host. > > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx> > > Cc: Wei Liu <wei.liu2@xxxxxxxxxx> > > Cc: Ian Campbell <ian.campbell@xxxxxxxxxx> > > Cc: David Vrabel <david.vrabel@xxxxxxxxxx> > > --- > > drivers/net/xen-netback/common.h | 26 ++--- > > drivers/net/xen-netback/interface.c | 45 ++++---- > > drivers/net/xen-netback/netback.c | 202 ++++++++++---------------------- > --- > > 3 files changed, 87 insertions(+), 186 deletions(-) > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen- > netback/common.h > > index 08ae01b..d7888db 100644 > > --- a/drivers/net/xen-netback/common.h > > +++ b/drivers/net/xen-netback/common.h > > @@ -136,12 +136,7 @@ struct xenvif { > > char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */ > > struct xen_netif_rx_back_ring rx; > > struct sk_buff_head rx_queue; > > - > > - /* Allow xenvif_start_xmit() to peek ahead in the rx request > > - * ring. This is a prediction of what rx_req_cons will be > > - * once all queued skbs are put on the ring. > > - */ > > - RING_IDX rx_req_cons_peek; > > + bool rx_event; > > > > What is this for? Reading through the code my impression is that this is > a flag indicating RX interrupt is triggered. Why is it useful? Why is > skb_queue_empty not sufficient? > The thread is kicked in two cases now; when a new packet is added and when the frontend sends an event (e.g. when new requests are posted). This is an explicit flag for the latter. I didn't want to risk missing a queue wake in the case where the internal queue was fully drained yielding a shared ring that was more than half full, which I think is what would happen if I didn't have a flag such as this to control the thread wake. > > /* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each > > * head/fragment page uses 2 copy operations because it > > @@ -198,8 +193,6 @@ void xenvif_xenbus_fini(void); > > > [...] > > > > -void xenvif_notify_tx_completion(struct xenvif *vif) > > -{ > > - if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif)) > > - netif_wake_queue(vif->dev); > > -} > > - > > static struct net_device_stats *xenvif_get_stats(struct net_device *dev) > > { > > struct xenvif *vif = netdev_priv(dev); > > @@ -378,6 +378,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long > tx_ring_ref, > > if (err < 0) > > goto err; > > > > + init_waitqueue_head(&vif->wq); > > + > > I guess the reason behind this code movement is that, if we don't do this > netback will crash if we get interrupt before waitqueue is set up? > Yep. > > if (tx_evtchn == rx_evtchn) { > > /* feature-split-event-channels == 0 */ > > err = bind_interdomain_evtchn_to_irqhandler( > [...] > > @@ -583,23 +477,36 @@ void xenvif_rx_action(struct xenvif *vif) > > > > skb_queue_head_init(&rxq); > > > > - count = 0; > > - > > while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) { > > - vif = netdev_priv(skb->dev); > > - nr_frags = skb_shinfo(skb)->nr_frags; > > + int max_slots_needed; > > + int i; > > > > - sco = (struct skb_cb_overlay *)skb->cb; > > - sco->meta_slots_used = xenvif_gop_skb(skb, &npo); > > + /* We need a cheap worse case estimate for the number of > > + * slots we'll use. > > + */ > > > > - count += nr_frags + 1; > > + max_slots_needed = 2; > > I'm afraid this starting point is not correct. Consider you have a SKB > with very large linear buffer, you might need more than 2 slots to fit > that in, right? > I was wondering about that. Would a roundup of mtu to the nearest 4k boundary be sufficient? > > + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > > + struct page *page; > > > > - __skb_queue_tail(&rxq, skb); > > + page = skb_frag_page(&skb_shinfo(skb)->frags[i]); > > + max_slots_needed += 1 << compound_order(page); > > This estimation might reserve slots more than necessary. > Yes, it needs to be worse case - but not too bad. > The removal of xenvif_count_skb_slots() is OK because you're now > estimating minimum slots, but the actual number of slots consumed needs > to be acurate. Any reason you don't want to use xenvif_gop_skb()? > I can't. xenvif_gop_skb() pulls requests off the ring and if there aren't enough there it will crash'n'burn as far as I can tell. So, I need an upper bound on the number of slots before I call it. > > + } > > + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4 || > > + skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) > > + max_slots_needed++; > > > > - /* Filled the batch queue? */ > > - /* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */ > > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > > + /* If the skb may not fit then bail out now */ > > + if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) { > > + skb_queue_head(&vif->rx_queue, skb); > > break; > > + } > > + > > + sco = (struct skb_cb_overlay *)skb->cb; > > + sco->meta_slots_used = xenvif_gop_skb(skb, &npo); > > + BUG_ON(sco->meta_slots_used > max_slots_needed); > > + > > + __skb_queue_tail(&rxq, skb); > > } > > > [...] > > + > > int xenvif_kthread(void *data) > > { > > struct xenvif *vif = data; > > + const int threshold = XEN_NETIF_RX_RING_SIZE / 2; > > > > while (!kthread_should_stop()) { > > + > > Stray blank line. Yep - will ditch that. Thanks, Paul > > > Wei. > > > wait_event_interruptible(vif->wq, > > rx_work_todo(vif) || > > kthread_should_stop()); > > if (kthread_should_stop()) > > break; > > > > - if (rx_work_todo(vif)) > > + if (!skb_queue_empty(&vif->rx_queue)) > > xenvif_rx_action(vif); > > > > + vif->rx_event = false; > > + if (xenvif_rx_ring_slots_available(vif, threshold)) > > + xenvif_start_queue(vif); > > + > > cond_resched(); > > } > > > > -- > > 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.