[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] ARM: DOMU: Deadlock in xen-netfront and patch to solve it.



Hi,

I am using uEVM5432 (OMAP5432) board with xen 4.5.
(XEN) Xen version 4.5-unstable (dpiotrov@)Â
(arm-linux-gnueabihf-gcc (Ubuntu/Linaro 4.8.2-16ubuntu4) 4.8.2) debug=yÂ

Linux version 3.16.2 (dpiotrov@ubuntu)Â
(gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-16ubuntu4) )

Dom0 work with no issues,Â
domU - starting good, but when iÂ
try to use networking - ping for example,Â
i got next message(ping continue to work):

~ #ping 192.168.0.3
PING 192.168.0.3 (192.168.0.3): 56 data bytes

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
3.16.2 #16 Not tainted
---------------------------------------------------------
swapper/0/0 just changed the state of lock:
Â(&(&queue->tx_lock)->rlock){-.....}, at: [<c03adec8>] xennet_tx_interrupt+0x14/0x34
but this lock took another, HARDIRQ-unsafe lock in the past:
Â(&stat->syncp.seq#2){+.-...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
ÂPossible interrupt unsafe locking scenario:

   ÂCPU0          ÂCPU1
   Â----          Â----
 lock(&stat->syncp.seq#2);
               Âlocal_irq_disable();
               Âlock(&(&queue->tx_lock)->rlock);
               Âlock(&stat->syncp.seq#2);
 <Interrupt>
  lock(&(&queue->tx_lock)->rlock);
Â*** DEADLOCK ***
Â
... see all the log at end of the email ...

By my opinion, this possible happened because,Â
timing of domU machine is different (compared to dom0 or nonvirtual linux machine).Â
Which cause changes in race results between irq and another lockers.

I made the following changes in drivers/net/xen-netfront.c. That solves this problem:

--- a/drivers/net/xen-netfront.c 2014-09-28 14:29:07.000000000 -0700
+++ b/drivers/net/xen-netfront.c 2014-10-01 00:41:19.629581110 -0700
@@ -959,6 +959,7 @@
 struct netfront_stats *stats = this_cpu_ptr(queue->info->stats);
 int packets_dropped = 0;
 struct sk_buff *skb;
+ unsigned long flags;
Â
 while ((skb = __skb_dequeue(rxq)) != NULL) {
 int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
@@ -977,10 +978,12 @@
 continue;
 }
Â
+ Â Â Â Â Â Â local_irq_save(flags);
 u64_stats_update_begin(&stats->syncp);
 stats->rx_packets++;
 stats->rx_bytes += skb->len;
 u64_stats_update_end(&stats->syncp);
+ local_irq_restore(flags);
Â
 /* Pass it up. */
 napi_gro_receive(&queue->napi, skb);
@@ -1119,15 +1122,18 @@
 struct netfront_stats *stats = per_cpu_ptr(np->stats, cpu);
 u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
 unsigned int start;
+ unsigned long flags;
Â
+ local_irq_save(flags);
 do {
- start = u64_stats_fetch_begin_irq(&stats->syncp);
+ start = u64_stats_fetch_begin(&stats->syncp);
Â
 rx_packets = stats->rx_packets;
 tx_packets = stats->tx_packets;
 rx_bytes = stats->rx_bytes;
 tx_bytes = stats->tx_bytes;
- } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+ } while (u64_stats_fetch_retry(&stats->syncp, start));
+ local_irq_restore(flags);
Â
 tot->rx_packets += rx_packets;
 tot->tx_packets += tx_packets;

Can you please review my patch?
If the patch solve actual problem âÂ
please help me to deliver it -Â
i not so familiar with this sequence.

Thanks,
Dmitry.
--------------------------------------------------------
All the log of deadlock:

~ #ping 192.168.0.3

PING 192.168.0.3 (192.168.0.3): 56 data bytes



=========================================================

[ INFO: possible irq lock inversion dependency detected ]

3.16.2 #16 Not tainted

---------------------------------------------------------

swapper/0/0 just changed the state of lock:

Â(&(&queue->tx_lock)->rlock){-.....}, at: [<c03adec8>] xennet_tx_interrupt+0x14/0x34

but this lock took another, HARDIRQ-unsafe lock in the past:

Â(&stat->syncp.seq#2){+.-...}



and interrupts could create inverse lock ordering between them.





other info that might help us debug this:

ÂPossible interrupt unsafe locking scenario:



   ÂCPU0          ÂCPU1

   Â----          Â----

 lock(&stat->syncp.seq#2);

               Âlocal_irq_disable();

               Âlock(&(&queue->tx_lock)->rlock);

               Âlock(&stat->syncp.seq#2);

 <Interrupt>

  lock(&(&queue->tx_lock)->rlock);



Â*** DEADLOCK ***



no locks held by swapper/0/0.



the shortest dependencies between 2nd lock and 1st lock:

Â-> (&stat->syncp.seq#2){+.-...} ops: 24 {

  HARDIRQ-ON-W at:

           [<c03b084c>] xennet_poll+0x908/0xb1c

           [<c047db30>] net_rx_action+0xbc/0x18c

           [<c003b638>] __do_softirq+0x154/0x2dc

           [<c003babc>] irq_exit+0xc0/0x104

           [<c000f240>] handle_IRQ+0x44/0x90

           [<c000858c>] gic_handle_irq+0x2c/0x5c

           [<c0012944>] __irq_svc+0x44/0x58

           [<c000f3a0>] arch_cpu_idle+0x24/0x30

           [<c000f3a0>] arch_cpu_idle+0x24/0x30

           [<c006c3f4>] cpu_startup_entry+0x104/0x15c

           [<c072abac>] start_kernel+0x390/0x39c

  IN-SOFTIRQ-W at:

           [<c03b084c>] xennet_poll+0x908/0xb1c

           [<c047db30>] net_rx_action+0xbc/0x18c

           [<c003b638>] __do_softirq+0x154/0x2dc

           [<c003babc>] irq_exit+0xc0/0x104

           [<c000f240>] handle_IRQ+0x44/0x90

           [<c000858c>] gic_handle_irq+0x2c/0x5c

           [<c0012944>] __irq_svc+0x44/0x58

           [<c000f3a0>] arch_cpu_idle+0x24/0x30

           [<c000f3a0>] arch_cpu_idle+0x24/0x30

           [<c006c3f4>] cpu_startup_entry+0x104/0x15c

           [<c072abac>] start_kernel+0x390/0x39c

  INITIAL USE at:

          Â[<c03aeabc>] xennet_get_stats64+0x13c/0x190

          Â[<c047835c>] dev_get_stats+0x38/0xc8

          Â[<c048c95c>] rtnl_fill_ifinfo+0x3b8/0x930

          Â[<c048cf3c>] rtmsg_ifinfo+0x68/0xec

          Â[<c048088c>] register_netdevice+0x484/0x4f4

          Â[<c0480910>] register_netdev+0x14/0x24

          Â[<c03aec4c>] netfront_probe+0x13c/0x244

          Â[<c02cf944>] xenbus_dev_probe+0x80/0x134

          Â[<c0316220>] driver_probe_device+0x118/0x250

          Â[<c0316430>] __driver_attach+0x94/0x98

          Â[<c0314858>] bus_for_each_dev+0x60/0x94

          Â[<c0315a2c>] bus_add_driver+0x148/0x1f0

          Â[<c03167f4>] driver_register+0x78/0xf8

          Â[<c02d1154>] xenbus_register_frontend+0x20/0x38

          Â[<c0008850>] do_one_initcall+0x8c/0x1c8

          Â[<c072acf0>] kernel_init_freeable+0x138/0x1d8

          Â[<c0539254>] kernel_init+0x8/0xf0

          Â[<c000e9e8>] ret_from_fork+0x14/0x2c

 }

 ... key   Âat: [<c0fb00a4>] __key.45274+0x0/0x8

 ... acquired at:

 Â[<c047ef3c>] dev_hard_start_xmit+0x2c8/0x44c

 Â[<c04986dc>] sch_direct_xmit+0xc4/0x2e8

 Â[<c047f2cc>] __dev_queue_xmit+0x20c/0x5f4

 Â[<c04d60f8>] arp_solicit+0xf8/0x278

 Â[<c0486634>] neigh_probe+0x50/0x78

 Â[<c0487e04>] __neigh_event_send+0x98/0x328

 Â[<c0488240>] neigh_resolve_output+0x1ac/0x268

 Â[<c04aa464>] ip_finish_output+0x330/0xf74

 Â[<c04ad4c8>] ip_send_skb+0x18/0xd4

 Â[<c04cef20>] raw_sendmsg+0x504/0x93c

 Â[<c0466858>] sock_sendmsg+0x88/0xac

 Â[<c046898c>] SyS_sendto+0xb8/0xdc

 Â[<c000e920>] ret_fast_syscall+0x0/0x48



-> (&(&queue->tx_lock)->rlock){-.....} ops: 6 {

 ÂIN-HARDIRQ-W at:

          [<c0543294>] _raw_spin_lock_irqsave+0x40/0x54

          [<c03adec8>] xennet_tx_interrupt+0x14/0x34

          [<c007e960>] handle_irq_event_percpu+0x38/0x134

          [<c007ea98>] handle_irq_event+0x3c/0x5c

          [<c0081580>] handle_edge_irq+0xa4/0x184

          [<c007e178>] generic_handle_irq+0x2c/0x3c

          [<c02cc738>] evtchn_fifo_handle_events+0x18c/0x198

          [<c02c9dd4>] __xen_evtchn_do_upcall+0x70/0xa4

          [<c000e100>] xen_arm_callback+0x8/0x10

          [<c0081cd4>] handle_percpu_devid_irq+0x6c/0x84

          [<c007e178>] generic_handle_irq+0x2c/0x3c

          [<c000f23c>] handle_IRQ+0x40/0x90

          [<c000858c>] gic_handle_irq+0x2c/0x5c

          [<c0012944>] __irq_svc+0x44/0x58

          [<c000f3a0>] arch_cpu_idle+0x24/0x30

          [<c000f3a0>] arch_cpu_idle+0x24/0x30

          [<c006c3f4>] cpu_startup_entry+0x104/0x15c

          [<c072abac>] start_kernel+0x390/0x39c

 ÂINITIAL USE at:

         Â[<c05432e4>] _raw_spin_lock_irq+0x3c/0x4c

         Â[<c03afcfc>] netback_changed+0xc3c/0xe84

         Â[<c02ce824>] xenwatch_thread+0x80/0x120

         Â[<c0052bec>] kthread+0xe0/0xf4

         Â[<c000e9e8>] ret_from_fork+0x14/0x2c

Â}

Â... key   Âat: [<c0fb00ac>] __key.45357+0x0/0x8

Â... acquired at:

 Â[<c0075a68>] __lock_acquire+0xae0/0x1d7c

 Â[<c00774e0>] lock_acquire+0x68/0x7c

 Â[<c0543294>] _raw_spin_lock_irqsave+0x40/0x54

 Â[<c03adec8>] xennet_tx_interrupt+0x14/0x34

 Â[<c007e960>] handle_irq_event_percpu+0x38/0x134

 Â[<c007ea98>] handle_irq_event+0x3c/0x5c

 Â[<c0081580>] handle_edge_irq+0xa4/0x184

 Â[<c007e178>] generic_handle_irq+0x2c/0x3c

 Â[<c02cc738>] evtchn_fifo_handle_events+0x18c/0x198

 Â[<c02c9dd4>] __xen_evtchn_do_upcall+0x70/0xa4

 Â[<c000e100>] xen_arm_callback+0x8/0x10

 Â[<c0081cd4>] handle_percpu_devid_irq+0x6c/0x84

 Â[<c007e178>] generic_handle_irq+0x2c/0x3c

 Â[<c000f23c>] handle_IRQ+0x40/0x90

 Â[<c000858c>] gic_handle_irq+0x2c/0x5c

 Â[<c0012944>] __irq_svc+0x44/0x58

 Â[<c000f3a0>] arch_cpu_idle+0x24/0x30

 Â[<c000f3a0>] arch_cpu_idle+0x24/0x30

 Â[<c006c3f4>] cpu_startup_entry+0x104/0x15c

 Â[<c072abac>] start_kernel+0x390/0x39c





stack backtrace:

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.2 #16

[<c0015aa8>] (unwind_backtrace) from [<c0011e4c>] (show_stack+0x10/0x14)

[<c0011e4c>] (show_stack) from [<c053d524>] (dump_stack+0x84/0x94)

[<c053d524>] (dump_stack) from [<c00737d4>] (print_irq_inversion_bug+0x1c4/0x20c)

[<c00737d4>] (print_irq_inversion_bug) from [<c00738d4>] (check_usage_forwards+0xb8/0x120)

[<c00738d4>] (check_usage_forwards) from [<c0074598>] (mark_lock+0x370/0x7a0)

[<c0074598>] (mark_lock) from [<c0075a68>] (__lock_acquire+0xae0/0x1d7c)

[<c0075a68>] (__lock_acquire) from [<c00774e0>] (lock_acquire+0x68/0x7c)

[<c00774e0>] (lock_acquire) from [<c0543294>] (_raw_spin_lock_irqsave+0x40/0x54)

[<c0543294>] (_raw_spin_lock_irqsave) from [<c03adec8>] (xennet_tx_interrupt+0x14/0x34)

[<c03adec8>] (xennet_tx_interrupt) from [<c007e960>] (handle_irq_event_percpu+0x38/0x134)

[<c007e960>] (handle_irq_event_percpu) from [<c007ea98>] (handle_irq_event+0x3c/0x5c)

[<c007ea98>] (handle_irq_event) from [<c0081580>] (handle_edge_irq+0xa4/0x184)

[<c0081580>] (handle_edge_irq) from [<c007e178>] (generic_handle_irq+0x2c/0x3c)

[<c007e178>] (generic_handle_irq) from [<c02cc738>] (evtchn_fifo_handle_events+0x18c/0x198)

[<c02cc738>] (evtchn_fifo_handle_events) from [<c02c9dd4>] (__xen_evtchn_do_upcall+0x70/0xa4)

[<c02c9dd4>] (__xen_evtchn_do_upcall) from [<c000e100>] (xen_arm_callback+0x8/0x10)

[<c000e100>] (xen_arm_callback) from [<c0081cd4>] (handle_percpu_devid_irq+0x6c/0x84)

[<c0081cd4>] (handle_percpu_devid_irq) from [<c007e178>] (generic_handle_irq+0x2c/0x3c)

[<c007e178>] (generic_handle_irq) from [<c000f23c>] (handle_IRQ+0x40/0x90)

[<c000f23c>] (handle_IRQ) from [<c000858c>] (gic_handle_irq+0x2c/0x5c)

[<c000858c>] (gic_handle_irq) from [<c0012944>] (__irq_svc+0x44/0x58)

Exception stack(0xc076df68 to 0xc076dfb0)

df60: Â Â Â Â Â Â Â Â Â 00000001 00000001 00000000 c077b3b0 c076c000 c077854c

df80: c07784e4 c07c061f c05472c0 00000001 c07c061f 00000000 00000000 c076dfb0

dfa0: c0074b5c c000f3a0 20070013 ffffffff

[<c0012944>] (__irq_svc) from [<c000f3a0>] (arch_cpu_idle+0x24/0x30)

[<c000f3a0>] (arch_cpu_idle) from [<c006c3f4>] (cpu_startup_entry+0x104/0x15c)

[<c006c3f4>] (cpu_startup_entry) from [<c072abac>] (start_kernel+0x390/0x39c)

64 bytes from 192.168.0.3: seq=0 ttl=64 time=328.585 ms

64 bytes from 192.168.0.3: seq=1 ttl=64 time=2.135 ms

64 bytes from 192.168.0.3: seq=2 ttl=64 time=2.328 ms


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.