| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
 Re: [Xen-devel] [PATCH] xen-netback: fix race between napi_complete() and interrupt handler
 My idea was that the current code can't race with interrupt running on a 
different CPU, because if the interrupt was moved since the last 
napi_schedule (which scheduled NAPI on the same CPU as the interrupt), 
the kernel would make sure that the NAPI instance is moved along with 
it. However I couldn't find any trace of this in the kernel so far, but 
the current code actually works for me, even when I used a bash script 
to aggressively move the interrupts around while running.
I've added David and Eric to the mailing, maybe they can quickly shed 
some light on this: how does the kernel make sure that if the interrupt 
is moved away from a CPU (e.g. by irqbalance), the NAPI instance already 
scheduled there won't race with it?
 
Zoli
On 25/03/14 14:08, David Vrabel wrote:
 
When the NAPI budget was not all used, xenvif_poll() would call
napi_complete() /after/ enabling the interrupt.  This resulted in a
race between the napi_complete() and the napi_schedule() in the
interrupt handler.  The use of local_irq_save/restore() avoided by
race iff the handler is running on the same CPU but not if it was
running on a different CPU.
Fix this properly by calling napi_complete() before reenabling
interrupts (in the xenvif_check_rx_xenvif() call).
Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
---
  drivers/net/xen-netback/interface.c |   28 ++--------------------------
  1 files changed, 2 insertions(+), 26 deletions(-)
diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 7669d49..ee322d9 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -65,32 +65,8 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
        work_done = xenvif_tx_action(vif, budget);
        if (work_done < budget) {
-               int more_to_do = 0;
-               unsigned long flags;
-
-               /* It is necessary to disable IRQ before calling
-                * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might
-                * lose event from the frontend.
-                *
-                * Consider:
-                *   RING_HAS_UNCONSUMED_REQUESTS
-                *   <frontend generates event to trigger napi_schedule>
-                *   __napi_complete
-                *
-                * This handler is still in scheduled state so the
-                * event has no effect at all. After __napi_complete
-                * this handler is descheduled and cannot get
-                * scheduled again. We lose event in this case and the ring
-                * will be completely stalled.
-                */
-
-               local_irq_save(flags);
-
-               RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
-               if (!more_to_do)
-                       __napi_complete(napi);
-
-               local_irq_restore(flags);
+               napi_complete(napi);
+               xenvif_check_rx_xenvif(vif);
        }
        return work_done;
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
 
 |