|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] rcu_sched self-detect stall when disable vif device
On 28/01/15 17:27, Julien Grall wrote:
> On 28/01/15 17:06, David Vrabel wrote:
>> On 28/01/15 16:45, Julien Grall wrote:
>>> On 27/01/15 16:53, Wei Liu wrote:
>>>> On Tue, Jan 27, 2015 at 04:47:45PM +0000, Julien Grall wrote:
>>>>> On 27/01/15 16:45, Wei Liu wrote:
>>>>>> On Tue, Jan 27, 2015 at 04:03:52PM +0000, Julien Grall wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> While I'm working on support for 64K page in netfront, I got
>>>>>>> an rcu_sced self-detect message. It happens when netback is
>>>>>>> disabling the vif device due to an error.
>>>>>>>
>>>>>>> I'm using Linux 3.19-rc5 on seattle (ARM64). Any idea why
>>>>>>> the processor is stucked in xenvif_rx_queue_purge?
>>>>>>>
>>>>>>
>>>>>> When you try to release a SKB, core network driver need to enter some
>>>>>> RCU cirital region to clean up. dst_release for one, calls call_rcu.
>>>>>
>>>>> But this message shouldn't happen in normal condition or because of
>>>>> netfront. Right?
>>>>>
>>>>
>>>> Never saw report like this before, even in the case that netfront is
>>>> buggy.
>>>
>>> This is only happening when preemption is not enabled (i.e
>>> CONFIG_PREEMPT_NONE in the config file) in the backend kernel.
>>>
>>> When the vif is disabled, the loop in xenvif_kthread_guest_rx turned
>>> into an infinite loop. In my case, the code executed looks like:
>>>
>>>
>>> 1. for (;;) {
>>> 2. xenvif_wait_for_rx_work(queue);
>>> 3.
>>> 4. if (kthread_should_stop())
>>> 5. break;
>>> 6.
>>> 7. if (unlikely(vif->disabled && queue->id == 0) {
>>> 8. xenvif_carrier_off(vif);
>>> 9. xenvif_rx_queue_purge(queue);
>>> 10. continue;
>>> 11. }
>>> 12. }
>>>
>>> The wait on line 2 will return directly because the vif is disabled
>>> (see xenvif_have_rx_work)
>>>
>>> We are on queue 0, so the condition on line 7 is true. Therefore we will
>>> loop on line 10. And so on...
>>>
>>> On platform where preemption is not enabled, this thread will never
>>> yield/give the hand to another thread (unless the domain is destroyed).
>>
>> I'm not sure why we have a continue in the vif->disabled case and not
>> just a break. Can you try that?
>
> So I applied this small patches:
>
> diff --git a/drivers/net/xen-netback/netback.c
> b/drivers/net/xen-netback/netback.c
> index 908e65e..9448c6c 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -2110,7 +2110,7 @@ int xenvif_kthread_guest_rx(void *data)
> if (unlikely(vif->disabled && queue->id == 0)) {
> xenvif_carrier_off(vif);
> xenvif_rx_queue_purge(queue);
> - continue;
> + break;
> }
>
> if (!skb_queue_empty(&queue->rx_queue))
How about this?
8<------------------------------------------
xen-netback: stop the guest rx thread after a fatal error
After commit e9d8b2c2968499c1f96563e6522c56958d5a1d0d (xen-netback:
disable rogue vif in kthread context), a fatal (protocol) error would
leave the guest Rx thread spinning, wasting CPU time. Commit
ecf08d2dbb96d5a4b4bcc53a39e8d29cc8fef02e (xen-netback: reintroduce
guest Rx stall detection) made this even worse by removing a
cond_resched() from this path.
A fatal error is non-recoverable so just allow the guest Rx thread to
exit. This requires taking additional refs to the task so the thread
exiting early is handled safely.
Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
diff --git a/drivers/net/xen-netback/interface.c
b/drivers/net/xen-netback/interface.c
index 9259a73..037f74f 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -578,6 +578,7 @@ int xenvif_connect(struct xenvif_queue *queue,
unsigned long tx_ring_ref,
goto err_rx_unbind;
}
queue->task = task;
+ get_task_struct(task);
task = kthread_create(xenvif_dealloc_kthread,
(void *)queue, "%s-dealloc", queue->name);
@@ -634,6 +635,7 @@ void xenvif_disconnect(struct xenvif *vif)
if (queue->task) {
kthread_stop(queue->task);
+ put_task_struct(queue->task);
queue->task = NULL;
}
diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
index 908e65e..c8ce701 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -2109,8 +2109,7 @@ int xenvif_kthread_guest_rx(void *data)
*/
if (unlikely(vif->disabled && queue->id == 0)) {
xenvif_carrier_off(vif);
- xenvif_rx_queue_purge(queue);
- continue;
+ break;
}
if (!skb_queue_empty(&queue->rx_queue))
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |