[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen-netback: fix race between napi_complete() and interrupt handler
You forgot to target this patch to "net" tree in subject line. On Tue, Mar 25, 2014 at 02:08:25PM +0000, David Vrabel wrote: > When the NAPI budget was not all used, xenvif_poll() would call > napi_complete() /after/ enabling the interrupt. This resulted in a > race between the napi_complete() and the napi_schedule() in the > interrupt handler. The use of local_irq_save/restore() avoided by > race iff the handler is running on the same CPU but not if it was > running on a different CPU. > OK, I understand this issue now. You mentioned it in the other email which made me a bit confused. Just curious, how do you trigger this? By re-binding the interrupt to another CPU when xenvif_poll is running? I used to run irqbalance (the one that works with xen virtual interrupt) but could not trigger a race. Probably the race window is too small to trigger? > Fix this properly by calling napi_complete() before reenabling > interrupts (in the xenvif_check_rx_xenvif() call). > > Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> > --- > drivers/net/xen-netback/interface.c | 28 ++-------------------------- > 1 files changed, 2 insertions(+), 26 deletions(-) > > diff --git a/drivers/net/xen-netback/interface.c > b/drivers/net/xen-netback/interface.c > index 7669d49..ee322d9 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -65,32 +65,8 @@ static int xenvif_poll(struct napi_struct *napi, int > budget) > work_done = xenvif_tx_action(vif, budget); > > if (work_done < budget) { > - int more_to_do = 0; > - unsigned long flags; > - > - /* It is necessary to disable IRQ before calling > - * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might > - * lose event from the frontend. > - * > - * Consider: > - * RING_HAS_UNCONSUMED_REQUESTS > - * <frontend generates event to trigger napi_schedule> > - * __napi_complete > - * > - * This handler is still in scheduled state so the > - * event has no effect at all. After __napi_complete > - * this handler is descheduled and cannot get > - * scheduled again. We lose event in this case and the ring > - * will be completely stalled. > - */ > - > - local_irq_save(flags); > - > - RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); > - if (!more_to_do) > - __napi_complete(napi); > - > - local_irq_restore(flags); > + napi_complete(napi); You need to add comment here to say interrupt is in fact "disabled" before this point, and "enabled" by xenvif_check_rx_xenvif(). > + xenvif_check_rx_xenvif(vif); To be honest this function call is not immediately obvious about it's side effect. I don't mind you copy the code in that function here. Wei. > } > > return work_done; > -- > 1.7.2.5 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |