[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] domU crash with kernel BUG at drivers/net/xen-netfront.c:305
On 2013/12/27 19:09, Vasily Evseenko wrote: From the stack, it reminds me this issue is very likely same with the one which has been fixed. There is something wrong with counting slots in netback, and then responses overlapps request in the ring, and grantcopy gets wrong grant reference and throws out error in grant_table.c. See http://lists.xen.org/archives/html/xen-devel/2013-09/msg01143.html There were some back and forth work for this issue, but seems the fix patch exists since v3.12-rc4. Would you like to have a try with newer kernel version?Hi, I've got domU crash (~ every 1-2 days under high network (tcp) load) with message: ----- [2013-12-26 03:53:18] kernel BUG at drivers/net/xen-netfront.c:305! [2013-12-26 03:53:18] invalid opcode: 0000 [#1] SMP [2013-12-26 03:53:18] Modules linked in: ipt_REJECT iptable_filter xt_set xt_REDIRECT iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables ip_set_hash_net ip_set_hash_ip ip_set nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_table s ipv6 ext3 jbd xen_netfront coretemp hwmon crc32_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr ext4 jbd2 mbcache aesni_intel ablk_helper c ryptd lrw gf128mul glue_helper aes_x86_64 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [2013-12-26 03:53:18] CPU: 0 PID: 15126 Comm: python Not tainted 3.10.25-11.x86_64 #1 [2013-12-26 03:53:18] task: ffff8801e5d68ac0 ti: ffff8801e7392000 task.ti: ffff8801e7392000 [2013-12-26 03:53:18] RIP: e030:[<ffffffffa015d637>] [<ffffffffa015d637>] xennet_alloc_rx_buffers+0x347/0x360 [xen_netfront] [2013-12-26 03:53:18] RSP: e02b:ffff8801f2e03ce0 EFLAGS: 00010282 [2013-12-26 03:53:18] RAX: 00000000000001d4 RBX: ffff8801e5438800 RCX: 0000000000000001 [2013-12-26 03:53:18] RDX: 000000000000002a RSI: 0000000000000000 RDI: 0000000000002200 [2013-12-26 03:53:18] RBP: ffff8801f2e03d40 R08: 0000000000000000 R09: 0000000000001000 [2013-12-26 03:53:18] R10: ffff8801000083c0 R11: dead000000200200 R12: 0000000000000220 [2013-12-26 03:53:18] R13: ffff8801e6eec0c0 R14: 000000000000002a R15: 000000000239642a [2013-12-26 03:53:18] FS: 00007f4cf48d57e0(0000) GS:ffff8801f2e00000(0000) knlGS:0000000000000000 [2013-12-26 03:53:18] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [2013-12-26 03:53:18] CR2: ffffffffff600400 CR3: 00000001e0db3000 CR4: 0000000000042660 [2013-12-26 03:53:18] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [2013-12-26 03:53:18] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [2013-12-26 03:53:18] Stack: [2013-12-26 03:53:18] ffff8801f2e03df0 02396417e5438000 ffff8801e5439d58 ffff8801e54394f0 [2013-12-26 03:53:18] ffff8801e5438000 002affff00000013 ffff8801f2e03d40 ffff8801f2e03db0 [2013-12-26 03:53:18] 0000000000000010 ffff8800655e6ac0 ffff8801e5438800 ffff8801e511a000 [2013-12-26 03:53:18] Call Trace: [2013-12-26 03:53:18] <IRQ> [2013-12-26 03:53:18] [<ffffffffa015dc44>] xennet_poll+0x2f4/0x630 [xen_netfront] [2013-12-26 03:53:18] [<ffffffff810640a9>] ? raise_softirq_irqoff+0x9/0x50 [2013-12-26 03:53:18] [<ffffffff8152050c>] ? dev_kfree_skb_irq+0x5c/0x70 [2013-12-26 03:53:18] [<ffffffff810e4fb9>] ? handle_irq_event_percpu+0xc9/0x210 [2013-12-26 03:53:18] [<ffffffff81528022>] net_rx_action+0x112/0x290 [2013-12-26 03:53:18] [<ffffffff810e514d>] ? handle_irq_event+0x4d/0x70 [2013-12-26 03:53:18] [<ffffffff81063c97>] __do_softirq+0xf7/0x270 [2013-12-26 03:53:18] [<ffffffff81600edc>] call_softirq+0x1c/0x30 [2013-12-26 03:53:18] [<ffffffff81014505>] do_softirq+0x65/0xa0 [2013-12-26 03:53:18] [<ffffffff810639c5>] irq_exit+0xc5/0xd0 [2013-12-26 03:53:18] [<ffffffff81351e45>] xen_evtchn_do_upcall+0x35/0x50 [2013-12-26 03:53:18] [<ffffffff81600f3e>] xen_do_hypervisor_callback+0x1e/0x30 [2013-12-26 03:53:18] <EOI> [2013-12-26 03:53:18] Code: 8b 35 ee f9 bb e1 48 8d bb 08 0d 00 00 48 83 c6 64 e8 2e f2 f0 e0 8b 83 ec 0c 00 00 31 d2 89 c1 d1 e9 39 d1 76 9e e9 5a ff ff ff <0f> 0b eb fe 0f 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 [2013-12-26 03:53:18] RIP [<ffffffffa015d637>] xennet_alloc_rx_buffers+0x347/0x360 [xen_netfront] [2013-12-26 03:53:18] RSP <ffff8801f2e03ce0> ------------ dom0 and domU kernels are vanilla 3.10.25 host server has 4 cores x 2 threads with mapping: 4 - dom0, 2 - domU, 2 - domU i've tried xen versions: 4.2.3 and 4.3.1 also i've tried to disable offloaing on domU: ethtool -K eth0 tx off tso off gso off ---- no effects domU's are under high TCP load (a lot of small tcp connections (web server)) sometimes i've got on dom0: --- [2013-12-26 00:16:30] (XEN) grant_table.c:289:d0 Increased maptrack size to 2 frames [2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference 99221507 [2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference 43646979 [2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference 43646979 [2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference 99221507 [2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference 43646979 [2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference 99221507 [2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference 99221507 [2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference 99221507 --- It seems the root of problem in dom0 messages above. Is it HW failure or some internal kernel structures overflow? Thanks Annie _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |