[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domU crash with kernel BUG at drivers/net/xen-netfront.c:305




On 2013/12/27 19:09, Vasily Evseenko wrote:
Hi,

I've got domU crash (~ every 1-2 days under high network (tcp) load)
with message:

-----
[2013-12-26 03:53:18] kernel BUG at drivers/net/xen-netfront.c:305!
[2013-12-26 03:53:18] invalid opcode: 0000 [#1] SMP
[2013-12-26 03:53:18] Modules linked in: ipt_REJECT iptable_filter
xt_set xt_REDIRECT iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat
ip_tables ip_set_hash_net ip_set_hash_ip ip_set nfnetlink ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_table
s ipv6 ext3 jbd xen_netfront coretemp hwmon crc32_pclmul crc32c_intel
ghash_clmulni_intel microcode pcspkr ext4 jbd2 mbcache aesni_intel
ablk_helper c
ryptd lrw gf128mul glue_helper aes_x86_64 xen_blkfront dm_mirror
dm_region_hash dm_log dm_mod
[2013-12-26 03:53:18] CPU: 0 PID: 15126 Comm: python Not tainted
3.10.25-11.x86_64 #1
[2013-12-26 03:53:18] task: ffff8801e5d68ac0 ti: ffff8801e7392000
task.ti: ffff8801e7392000
[2013-12-26 03:53:18] RIP: e030:[<ffffffffa015d637>]
[<ffffffffa015d637>] xennet_alloc_rx_buffers+0x347/0x360 [xen_netfront]
[2013-12-26 03:53:18] RSP: e02b:ffff8801f2e03ce0  EFLAGS: 00010282
[2013-12-26 03:53:18] RAX: 00000000000001d4 RBX: ffff8801e5438800 RCX:
0000000000000001
[2013-12-26 03:53:18] RDX: 000000000000002a RSI: 0000000000000000 RDI:
0000000000002200
[2013-12-26 03:53:18] RBP: ffff8801f2e03d40 R08: 0000000000000000 R09:
0000000000001000
[2013-12-26 03:53:18] R10: ffff8801000083c0 R11: dead000000200200 R12:
0000000000000220
[2013-12-26 03:53:18] R13: ffff8801e6eec0c0 R14: 000000000000002a R15:
000000000239642a
[2013-12-26 03:53:18] FS:  00007f4cf48d57e0(0000)
GS:ffff8801f2e00000(0000) knlGS:0000000000000000
[2013-12-26 03:53:18] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[2013-12-26 03:53:18] CR2: ffffffffff600400 CR3: 00000001e0db3000 CR4:
0000000000042660
[2013-12-26 03:53:18] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[2013-12-26 03:53:18] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[2013-12-26 03:53:18] Stack:
[2013-12-26 03:53:18]  ffff8801f2e03df0 02396417e5438000
ffff8801e5439d58 ffff8801e54394f0
[2013-12-26 03:53:18]  ffff8801e5438000 002affff00000013
ffff8801f2e03d40 ffff8801f2e03db0
[2013-12-26 03:53:18]  0000000000000010 ffff8800655e6ac0
ffff8801e5438800 ffff8801e511a000
[2013-12-26 03:53:18] Call Trace:
[2013-12-26 03:53:18]  <IRQ>
[2013-12-26 03:53:18]  [<ffffffffa015dc44>] xennet_poll+0x2f4/0x630
[xen_netfront]
[2013-12-26 03:53:18]  [<ffffffff810640a9>] ? raise_softirq_irqoff+0x9/0x50
[2013-12-26 03:53:18]  [<ffffffff8152050c>] ? dev_kfree_skb_irq+0x5c/0x70
[2013-12-26 03:53:18]  [<ffffffff810e4fb9>] ?
handle_irq_event_percpu+0xc9/0x210
[2013-12-26 03:53:18]  [<ffffffff81528022>] net_rx_action+0x112/0x290
[2013-12-26 03:53:18]  [<ffffffff810e514d>] ? handle_irq_event+0x4d/0x70
[2013-12-26 03:53:18]  [<ffffffff81063c97>] __do_softirq+0xf7/0x270
[2013-12-26 03:53:18]  [<ffffffff81600edc>] call_softirq+0x1c/0x30
[2013-12-26 03:53:18]  [<ffffffff81014505>] do_softirq+0x65/0xa0
[2013-12-26 03:53:18]  [<ffffffff810639c5>] irq_exit+0xc5/0xd0
[2013-12-26 03:53:18]  [<ffffffff81351e45>] xen_evtchn_do_upcall+0x35/0x50
[2013-12-26 03:53:18]  [<ffffffff81600f3e>]
xen_do_hypervisor_callback+0x1e/0x30
[2013-12-26 03:53:18]  <EOI>
[2013-12-26 03:53:18] Code: 8b 35 ee f9 bb e1 48 8d bb 08 0d 00 00 48 83
c6 64 e8 2e f2 f0 e0 8b 83 ec 0c 00 00 31 d2 89 c1 d1 e9 39 d1 76 9e e9
5a ff ff ff <0f> 0b eb fe 0f 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f
84 00
[2013-12-26 03:53:18] RIP  [<ffffffffa015d637>]
xennet_alloc_rx_buffers+0x347/0x360 [xen_netfront]
[2013-12-26 03:53:18]  RSP <ffff8801f2e03ce0>
------------

dom0 and domU kernels are vanilla 3.10.25
host server has 4 cores x 2 threads with mapping: 4 - dom0, 2 - domU, 2
- domU
i've tried xen versions: 4.2.3 and 4.3.1
also i've tried to disable offloaing on domU:  ethtool -K eth0 tx off
tso off gso off   ----  no effects

domU's are under high TCP load (a lot of small tcp connections (web server))
sometimes  i've got on dom0:
---
[2013-12-26 00:16:30] (XEN) grant_table.c:289:d0 Increased maptrack size
to 2 frames
[2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference
99221507
[2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference
43646979
[2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference
43646979
[2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference
99221507
[2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference
43646979
[2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference
99221507
[2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference
99221507
[2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference
99221507

---

It seems the root of problem in dom0 messages above. Is it HW failure or
some internal kernel structures overflow?
From the stack, it reminds me this issue is very likely same with the one which has been fixed. There is something wrong with counting slots in netback, and then responses overlapps request in the ring, and grantcopy gets wrong grant reference and throws out error in grant_table.c. See http://lists.xen.org/archives/html/xen-devel/2013-09/msg01143.html There were some back and forth work for this issue, but seems the fix patch exists since v3.12-rc4. Would you like to have a try with newer kernel version?

Thanks
Annie

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.