[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS



On Wed, 16 Jan 2013, Alex Bligh wrote:
> Kernel 3.2.0-32-generic on an x86_64
> 
> [ 1416.992402] BUG: unable to handle kernel paging request at
> ffff88073fee6e00
> [ 1416.992902] IP: [<ffffffff81318e2b>] memcpy+0xb/0x120
> [ 1416.993244] PGD 1c06067 PUD 7ec73067 PMD 7ee73067 PTE 0
> [ 1416.993985] Oops: 0000 [#1] SMP
> [ 1416.994433] CPU 4
> [ 1416.994587] Modules linked in: xt_physdev xen_pciback xen_netback
> xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs veth ip6t_LOG
> nf_conntrack_ipv6 nf_
> defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state
> xt_tcpudp nf_conntrack_netlink nfnetlink ebt_ip ebtable_filter
> iptable_mangle ipt_MASQUERADE
> iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
> iptable_filter ip_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad
> ib_core ib_addr iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi ebtable_broute ebtables
> x_tables dcdbas psmouse serio_raw amd64_edac_mod usbhid hid edac_core
> sp5100_tco i2c_piix
> 4 edac_mce_amd fam15h_power k10temp igb bnx2 acpi_power_meter mac_hid
> dm_multipath bridge 8021q garp stp ixgbe dca mdio nfsd nfs lockd fscache
> auth_rpcgss nf
> s_acl sunrpc [last unloaded: scsi_transport_iscsi]
> [ 1417.005011]
> [ 1417.005011] Pid: 0, comm: swapper/4 Tainted: G        W
> 3.2.0-32-generic #51-Ubuntu Dell Inc. PowerEdge R715/0C5MMK
> [ 1417.005011] RIP: e030:[<ffffffff81318e2b>]  [<ffffffff81318e2b>]
> memcpy+0xb/0x120
> [ 1417.005011] RSP: e02b:ffff880060083b08  EFLAGS: 00010246
> [ 1417.005011] RAX: ffff88001e12c9e4 RBX: 0000000000000210 RCX:
> 0000000000000040
> [ 1417.005011] RDX: 0000000000000000 RSI: ffff88073fee6e00 RDI:
> ffff88001e12c9e4
> [ 1417.005011] RBP: ffff880060083b70 R08: 00000000000002e8 R09:
> 0000000000000200
> [ 1417.005011] R10: ffff88001e12c9e4 R11: 0000000000000280 R12:
> 00000000000000e8
> [ 1417.005011] R13: ffff88004b014c00 R14: ffff88004b532000 R15:
> 0000000000000001
> [ 1417.005011] FS:  00007f1a99089700(0000) GS:ffff880060080000(0000)
> knlGS:0000000000000000
> [ 1417.005011] CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
> [ 1417.005011] CR2: ffff88073fee6e00 CR3: 0000000015d22000 CR4:
> 0000000000040660
> [ 1417.005011] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1417.005011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 1417.005011] Process swapper/4 (pid: 0, threadinfo ffff88004b532000,
> task ffff88004b538000)
> [ 1417.005011] Stack:
> [ 1417.005011]  ffffffff81532c0e 0000000000000000 ffff8800000002e8
> ffff880000000200
> [ 1417.005011]  ffff88001e12c9e4 0000000000000200 ffff88004b533fd8
> ffff880060083ba0
> [ 1417.005011]  ffff88004b015800 ffff88004b014c00 ffff88001b142000
> 00000000000000fc
> [ 1417.005011] Call Trace:
> [ 1417.005011]  <IRQ>
> [ 1417.005011]  [<ffffffff81532c0e>] ? skb_copy_bits+0x16e/0x2c0
> [ 1417.005011]  [<ffffffff8153463a>] skb_copy+0x8a/0xb0
> [ 1417.005011]  [<ffffffff8154b517>] neigh_probe+0x37/0x80
> [ 1417.005011]  [<ffffffff8154b9db>] __neigh_event_send+0xbb/0x210
> [ 1417.005011]  [<ffffffff8154bc73>] neigh_resolve_output+0x143/0x1f0
> [ 1417.005011]  [<ffffffff8156dde5>] ? nf_hook_slow+0x75/0x150
> [ 1417.005011]  [<ffffffff8157a510>] ? ip_fragment+0x810/0x810
> [ 1417.005011]  [<ffffffff8157a68e>] ip_finish_output+0x17e/0x2f0
> [ 1417.005011]  [<ffffffff81533ddb>] ? __alloc_skb+0x4b/0x240
> [ 1417.005011]  [<ffffffff8157b1e8>] ip_output+0x98/0xa0
> [ 1417.005011]  [<ffffffff8157a8a4>] ? __ip_local_out+0xa4/0xb0
> [ 1417.005011]  [<ffffffff8157a8d9>] ip_local_out+0x29/0x30
> [ 1417.005011]  [<ffffffff8157aa3c>] ip_queue_xmit+0x15c/0x410
> [ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
> [ 1417.005011]  [<ffffffff81592c69>] tcp_transmit_skb+0x359/0x580
> [ 1417.005011]  [<ffffffff81593be1>] tcp_retransmit_skb+0x171/0x310
> [ 1417.005011]  [<ffffffff8159561b>] tcp_retransmit_timer+0x21b/0x440
> [ 1417.005011]  [<ffffffff81595928>] tcp_write_timer+0xe8/0x110
> [ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
> [ 1417.005011]  [<ffffffff81075d36>] call_timer_fn+0x46/0x160
> [ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
> [ 1417.005011]  [<ffffffff81077682>] run_timer_softirq+0x132/0x2a0
> [ 1417.005011]  [<ffffffff8106e5d8>] __do_softirq+0xa8/0x210
> [ 1417.005011]  [<ffffffff813a94b7>] ? __xen_evtchn_do_upcall+0x207/0x250
> [ 1417.005011]  [<ffffffff816656ac>] call_softirq+0x1c/0x30
> [ 1417.005011]  [<ffffffff81015305>] do_softirq+0x65/0xa0
> [ 1417.005011]  [<ffffffff8106e9be>] irq_exit+0x8e/0xb0
> [ 1417.005011]  [<ffffffff813ab595>] xen_evtchn_do_upcall+0x35/0x50
> [ 1417.005011]  [<ffffffff816656fe>] xen_do_hypervisor_callback+0x1e/0x30
> [ 1417.005011]  <EOI>
> [ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
> [ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
> [ 1417.005011]  [<ffffffff8100a2d0>] ? xen_safe_halt+0x10/0x20
> [ 1417.005011]  [<ffffffff8101b983>] ? default_idle+0x53/0x1d0
> [ 1417.005011]  [<ffffffff81012236>] ? cpu_idle+0xd6/0x120
> [ 1417.005011]  [<ffffffff8100ab29>] ? xen_irq_enable_direct_reloc+0x4/0x4
> [ 1417.005011]  [<ffffffff8163369c>] ? cpu_bringup_and_idle+0xe/0x10
> [ 1417.005011] Code: 58 48 2b 43 50 88 43 4e 48 83 c4 08 5b 5d c3 90 e8
> 1b fe ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83
> e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c

It seems that the grant mapping is already gone by the time
tcp_retransmit is called.
That might happen because QEMU already completed the read/write
operation and called xc_gnttab_munmap, that causes the grant_table and
the m2p_override to remove the p2m and m2p mappings of the foreign
pages.

Isn't there a way to prevent tcp_retransmit from running when the
request is already completed? Or stop it if you find out that the pages
are already gone?

You could try persistent grants, that wouldn't solve the bug but they
should be able to "hide" it pretty well. Not ideal, I know.
The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
Konrad issued a pull request recently with the corresponding Linux
blkfront changes:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.8

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.