[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest-receive-side flow control



> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx]
> Sent: 28 November 2013 15:51
> To: Paul Durrant
> Cc: xen-devel@xxxxxxxxxxxxx; Wei Liu; Ian Campbell; David Vrabel
> Subject: Re: [PATCH net-next] xen-netback: improve guest-receive-side flow
> control
> 
> On Thu, Nov 28, 2013 at 01:11:12PM +0000, Paul Durrant wrote:
> > The flow control code relies on a double pass of tke skb, firstly to count
> > the number of ring slots needed to supply sufficient grant references, and
> > another to actually consume those references and build the grant copy
> > operations. It transpires that, in some scenarios, the initial count and the
> > number of references consumed differs and, after this happens a number
> of
> > times, flow control is completely broken.
> 
> Please add blank lines between paragraphs in later versions.
> 
> > This patch simplifies things by making a very quick static analysis of the
> > minimal number or ring slots needed for an skb and then only stopping the
> > queue if that minimal number of slots is not available. The worker thread
> > does a worst-case analysis and only processes what will definitely fit,
> > leaving remaining skbs on the internal queue. This patch also gets rid of 
> > the
> > drop that occurs if an skb won't fit (because that doesn't matter - we
> always
> > internally queue *at least* what we can put in the shared ring) and adds
> > some hysteresis to waking the external queue. (It only gets woken when at
> > least half the shared ring is available).
> 
> I think this is the right direction to go. However we've tripped over
> this several times so is it possible to test it with XenRT?
> 

That's on its way. It'll be a day or so to get results back though.

> > Without this patch I can trivially stall netback permanently by just doing
> > a large guest to guest file copy between two Windows Server 2008R2 VMs
> on a
> > single host.
> > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
> > Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
> > Cc: Ian Campbell <ian.campbell@xxxxxxxxxx>
> > Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> > ---
> >  drivers/net/xen-netback/common.h    |   26 ++---
> >  drivers/net/xen-netback/interface.c |   45 ++++----
> >  drivers/net/xen-netback/netback.c   |  202 ++++++++++----------------------
> ---
> >  3 files changed, 87 insertions(+), 186 deletions(-)
> >
> > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
> netback/common.h
> > index 08ae01b..d7888db 100644
> > --- a/drivers/net/xen-netback/common.h
> > +++ b/drivers/net/xen-netback/common.h
> > @@ -136,12 +136,7 @@ struct xenvif {
> >     char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
> >     struct xen_netif_rx_back_ring rx;
> >     struct sk_buff_head rx_queue;
> > -
> > -   /* Allow xenvif_start_xmit() to peek ahead in the rx request
> > -    * ring.  This is a prediction of what rx_req_cons will be
> > -    * once all queued skbs are put on the ring.
> > -    */
> > -   RING_IDX rx_req_cons_peek;
> > +   bool rx_event;
> >
> 
> What is this for? Reading through the code my impression is that this is
> a flag indicating RX interrupt is triggered. Why is it useful? Why is
> skb_queue_empty not sufficient?
> 

The thread is kicked in two cases now; when a new packet is added and when the 
frontend sends an event (e.g. when new requests are posted). This is an 
explicit flag for the latter. I didn't want to risk missing a queue wake in the 
case where the internal queue was fully drained yielding a shared ring that was 
more than half full, which I think is what would happen if I didn't have a flag 
such as this to control the thread wake.

> >     /* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
> >      * head/fragment page uses 2 copy operations because it
> > @@ -198,8 +193,6 @@ void xenvif_xenbus_fini(void);
> >
> [...]
> >
> > -void xenvif_notify_tx_completion(struct xenvif *vif)
> > -{
> > -   if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
> > -           netif_wake_queue(vif->dev);
> > -}
> > -
> >  static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
> >  {
> >     struct xenvif *vif = netdev_priv(dev);
> > @@ -378,6 +378,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long
> tx_ring_ref,
> >     if (err < 0)
> >             goto err;
> >
> > +   init_waitqueue_head(&vif->wq);
> > +
> 
> I guess the reason behind this code movement is that, if we don't do this
> netback will crash if we get interrupt before waitqueue is set up?
> 

Yep.

> >     if (tx_evtchn == rx_evtchn) {
> >             /* feature-split-event-channels == 0 */
> >             err = bind_interdomain_evtchn_to_irqhandler(
> [...]
> > @@ -583,23 +477,36 @@ void xenvif_rx_action(struct xenvif *vif)
> >
> >     skb_queue_head_init(&rxq);
> >
> > -   count = 0;
> > -
> >     while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
> > -           vif = netdev_priv(skb->dev);
> > -           nr_frags = skb_shinfo(skb)->nr_frags;
> > +           int max_slots_needed;
> > +           int i;
> >
> > -           sco = (struct skb_cb_overlay *)skb->cb;
> > -           sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> > +           /* We need a cheap worse case estimate for the number of
> > +            * slots we'll use.
> > +            */
> >
> > -           count += nr_frags + 1;
> > +           max_slots_needed = 2;
> 
> I'm afraid this starting point is not correct. Consider you have a SKB
> with very large linear buffer, you might need more than 2 slots to fit
> that in, right?
> 

I was wondering about that. Would a roundup of mtu to the nearest 4k boundary 
be sufficient?

> > +           for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> > +                   struct page *page;
> >
> > -           __skb_queue_tail(&rxq, skb);
> > +                   page = skb_frag_page(&skb_shinfo(skb)->frags[i]);
> > +                   max_slots_needed += 1 << compound_order(page);
> 
> This estimation might reserve slots more than necessary.
> 

Yes, it needs to be worse case - but not too bad.

> The removal of xenvif_count_skb_slots() is OK because you're now
> estimating minimum slots, but the actual number of slots consumed needs
> to be acurate. Any reason you don't want to use xenvif_gop_skb()?
> 

I can't. xenvif_gop_skb() pulls requests off the ring and if there aren't 
enough there it will crash'n'burn as far as I can tell. So, I need an upper 
bound on the number of slots before I call it.

> > +           }
> > +           if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4 ||
> > +               skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> > +                   max_slots_needed++;
> >
> > -           /* Filled the batch queue? */
> > -           /* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */
> > -           if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> > +           /* If the skb may not fit then bail out now */
> > +           if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
> > +                   skb_queue_head(&vif->rx_queue, skb);
> >                     break;
> > +           }
> > +
> > +           sco = (struct skb_cb_overlay *)skb->cb;
> > +           sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> > +           BUG_ON(sco->meta_slots_used > max_slots_needed);
> > +
> > +           __skb_queue_tail(&rxq, skb);
> >     }
> >
> [...]
> > +
> >  int xenvif_kthread(void *data)
> >  {
> >     struct xenvif *vif = data;
> > +   const int threshold = XEN_NETIF_RX_RING_SIZE / 2;
> >
> >     while (!kthread_should_stop()) {
> > +
> 
> Stray blank line.

Yep - will ditch that.

Thanks,

  Paul

> 
> 
> Wei.
> 
> >             wait_event_interruptible(vif->wq,
> >                                      rx_work_todo(vif) ||
> >                                      kthread_should_stop());
> >             if (kthread_should_stop())
> >                     break;
> >
> > -           if (rx_work_todo(vif))
> > +           if (!skb_queue_empty(&vif->rx_queue))
> >                     xenvif_rx_action(vif);
> >
> > +           vif->rx_event = false;
> > +           if (xenvif_rx_ring_slots_available(vif, threshold))
> > +                   xenvif_start_queue(vif);
> > +
> >             cond_resched();
> >     }
> >
> > --
> > 1.7.10.4

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.