[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] virtio_net: Fix napi poll list corruption

To: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>
From: Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>
Date: Mon, 22 Dec 2014 14:19:12 -0200
Cc: netdev@xxxxxxxxxxxxxxx, edumazet@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, boris.ostrovsky@xxxxxxxxxx, "David S. Miller" <davem@xxxxxxxxxxxxx>
Delivery-date: Wed, 24 Dec 2014 14:37:00 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 19-12-2014 22:23, Herbert Xu wrote:

David Vrabel <david.vrabel@xxxxxxxxxx> wrote:

After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt
masking in NAPI) the napi instance is removed from the per-cpu list
prior to calling the n->poll(), and is only requeued if all of the
budget was used.  This inadvertently broke netfront because netfront
does not use NAPI correctly.


A similar bug exists in virtio_net.

-- >8 --
The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less
interrupt masking in NAPI) breaks virtio_net in an insidious way.

It is now required that if the entire budget is consumed when poll
returns, the napi poll_list must remain empty.  However, like some
other drivers virtio_net tries to do a last-ditch check and if
there is more work it will call napi_schedule and then immediately
process some of this new work.  Should the entire budget be consumed
while processing such new work then we will violate the new caller
contract.

This patch fixes this by not touching any work when we reschedule
in virtio_net.

The worst part of this bug is that the list corruption causes other
napi users to be moved off-list.  In my case I was chasing a stall
in IPsec (IPsec uses netif_rx) and I only belatedly realised that it
was virtio_net which caused the stall even though the virtio_net
poll was still functioning perfectly after IPsec stalled.

Thanks for finding/fixing this, Herbert. I was debugging this one too. In mycase, vxlan interface was getting stuck.


  Marcelo


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] virtio_net: Fix napi poll list corruption
  - From: Herbert Xu

Prev by Date: Re: [Xen-devel] Help: VMXNET3 support with XEN 4.4.1
Next by Date: [Xen-devel] [PATCH] xc Python ext lib: Xen 4.6 Unable to start a 2T guest without OverflowError
Previous by thread: Re: [Xen-devel] virtio_net: Fix napi poll list corruption
Next by thread: [Xen-devel] [PATCH v4] xmalloc: add support for checking the pool integrity
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.