[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH net-next] xen-netfront: try linearizing SKB if it occupies too many slots
On Fri, May 16, 2014 at 05:29:05PM +0100, Zoltan Kiss wrote: > On 16/05/14 16:34, Wei Liu wrote: > > On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote: > >> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote: > >>> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote: > >>>> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote: > >>>> > >>>>> It's not that common to trigger this, I only saw a few reports. In fact > >>>>> Stefan's report is the first one that comes with a method to reproduce > >>>>> it. > >>>>> > >>>>> I tested with redis-benchmark on a guest with 256MB RAM and only saw a > >>>>> few "failed to linearize", never saw a single one with 1GB guest. > >>>> > >>>> Well, I am just saying. This is asking order-5 allocations, and yes, > >>>> this is going to fail after few days of uptime, no matter what you try. > >>>> > >>> > >>> Hmm... I see what you mean -- memory fragmentation leads to allocation > >>> failure. Thanks. > >> > >> In the mean time, have you tried to lower gso_max_size ? > >> > >> Setting it witk netif_set_gso_max_size() to something like 56000 might > >> avoid the problem. > >> > >> (Not sure if it is applicable in your case) > >> > > > > It works, at least in this Redis testcase. Could you explain a bit where > > this 56000 magic number comes from? :-) > > > > Presumably I can derive it from some constant in core network code? > > I guess it just makes more unlikely to have packets with problematic layout. > But the following packet would still fail: > linear buffer : 80 bytes, on 2 pages > 17 frags, 80 bytes each, each spanning over page boundary. > Presumably max GSO size affects packet layout, that's what I was trying to figure out. > I just had an idea: a modified version of xenvif_handle_frag_list function > from netback would be useful for us here. It recreates the frags array on > fully utilized 4k pages. Plus we can use pskb_expand_head to reduce the page > number on the linear buffer (although it might not work, see my comment in > the patch) > The worst case linear buffer then spans N+1 pages, and has N*PAGE_SIZE+1 > bytes. Then the frags after this coalescing should have 16*PAGE_SIZE - > (N*PAGE_SIZE+2) bytes at most, which is 16-N pages. Altogether that's 16+1 > page, which should definitely fit! > This is what I mean: > > 8<-------------- > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > index 158b5e6..b1133d6 100644 > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -544,6 +544,73 @@ static int xennet_count_skb_frag_slots(struct sk_buff > *skb) > return pages; > } > > +int xenvif_reduce_pages(struct sk_buff *skb, int target) > +{ > + unsigned int offset = skb_headlen(skb); > + skb_frag_t frags[MAX_SKB_FRAGS]; > + int newfrags, oldfrags; > + unsigned int pages, optimal; > + > + BUG_ON(!target); > + > + pages = DIV_ROUND_UP(offset_in_page(skb->data) + skb_headlen(skb), > PAGE_SIZE); > + optimal = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > + if (pages - optimal) { > + int err; > +/* FIXME: we should check if pskb_expand_head really allocates on page > boundary, > + * otherwise we can still have suboptimal page layout */ > + if (unlikely(err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC))) I'm a bit lost. What do you expect from the call to pskb_expand_head? I'm sorry I cannot see immediate result from the comment of pskb_expand_head. If you call with nhead and ntail equal to 0 it creates identical copy, but I don't see guarantee on page alignment. Did I miss something? > + return err; > + target -= pages - optimal; > + if (!target) > + return 0; > + } > + > + /* Subtract frags size, we will correct it later */ > + skb->truesize -= skb->data_len; > + > + /* Create a brand new frags array and coalesce there */ > + for (newfrags = 0; offset < skb->len; newfrags++) { > + struct page *page; > + unsigned int len; > + > + BUG_ON(newfrags >= MAX_SKB_FRAGS); > + page = alloc_page(GFP_ATOMIC); And the ammount of memory allocation is a bit overkill I think (though it's still better than the order-5 allocation in skb_linearize). Can you not just memmove all paged data to first few frags and release other frags? Anyway, this method might still work, just a bit overkill IMHO. Wei. > + if (!page) { > + int j; > + skb->truesize += skb->data_len; > + for (j = 0; j < newfrags; j++) > + put_page(frags[j].page.p); > + return -ENOMEM; > + } > + > + if (offset + PAGE_SIZE < skb->len) > + len = PAGE_SIZE; > + else > + len = skb->len - offset; > + if (skb_copy_bits(skb, offset, page_address(page), len)) > + BUG(); > + > + offset += len; > + frags[newfrags].page.p = page; > + frags[newfrags].page_offset = 0; > + skb_frag_size_set(&frags[newfrags], len); > + } > + > + /* Drop the original buffers */ > + for (oldfrags = 0; oldfrags < skb_shinfo(skb)->nr_frags; oldfrags++) > + skb_frag_unref(skb, oldfrags); > + > + /* Swap the new frags array with the old one */ > + memcpy(skb_shinfo(skb)->frags, > + frags, > + newfrags * sizeof(skb_frag_t)); > + skb_shinfo(skb)->nr_frags = newfrags; > + /* Correct truesize */ > + skb->truesize += newfrags * PAGE_SIZE; > + return 0; > +} > + > static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) > { > unsigned short id; > @@ -573,11 +640,21 @@ static int xennet_start_xmit(struct sk_buff *skb, > struct net_device *dev) > slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + > xennet_count_skb_frag_slots(skb); > if (unlikely(slots > MAX_SKB_FRAGS + 1)) { > - net_alert_ratelimited( > - "xennet: skb rides the rocket: %d slots\n", slots); > - goto drop; > + if (unlikely(xenvif_reduce_pages(skb, slots - MAX_SKB_FRAGS - > 1))) { > + net_alert_ratelimited( > + "xennet: couldn't reduce slot number from > %d\n", slots); > + goto drop; > + } > + slots = DIV_ROUND_UP(offset_in_page(data) + len, PAGE_SIZE) + > + xennet_count_skb_frag_slots(skb); > + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { > + net_alert_ratelimited( > + "xennet: slot reduction doesn't work, slots: > %d\n", slots); > + goto drop; > + } > } > > + > spin_lock_irqsave(&np->tx_lock, flags); > > if (unlikely(!netif_carrier_ok(dev) || _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |