[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] NAPI rescheduling and the delay caused by it

On 04/12/13 20:41, Eric Dumazet wrote:
On Wed, 2013-12-04 at 18:55 +0000, Zoltan Kiss wrote:

So, my questions are:
- why is NAPI rescheduled on an another CPU?
- why does it cause a 3-4 milisec delay?

NAPI can not be scheduled on another cpu.

But at the time of napi_schedule() call, napi_struct can be already be
scheduled by another cpu.

( NAPI_STATE_SCHED bit already set)
So I would say something made the 'other' cpu non responsive fast enough
to softirq events being ready for service.

(Another wakeup happened 3-4 millisec later)
Oh, thanks! I forgot to mention, I have my grant mapping patches applied. The callback when the previous packet is sent to the another vif schedules the NAPI instance on that other CPU. But it's still not clear why it takes so long to serve that softirq!

Really, I suspect your usage of netif_wake_queue() is simply wrong.

Check why we have netif_rx() and netif_rx_ni() variants.

And ask yourself if xenvif_notify_tx_completion() is correct, being
called from process context.
So, at the moment we use netif_wake_queue to notify the stack that it can call xenvif_start_xmit, the thread is ready to accept new packets for transmission. It is called when we get an interrupt from the frontend (it marks it made room in the ring), and from xenvif_notify_tx_completion at the end of the thread. The latter checks if queueing were stopped in the meantime, and see if the guest made space after our recent transmission. I see netif_rx_ni makes sure the softirq is executed, but I'm not sure I get how is it related to wake_queue. Can you explain a bit more?



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.