[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb
Hello, We are now more or less able to reproduce the OOPS within one hour by constantly shutting down the vm and rebooting it: > [32918.795695] XXXlan0: port 3(vif18.0) entered disabled state > [32918.798732] BUG: unable to handle kernel paging request at ffffc90010da2188 > [32918.798823] IP: [<ffffffffa04287dc>] xen_netbk_rx_action+0x18b/0x6f0 > [xen_netback] > [32918.798911] PGD 95822067 PUD 95823067 PMD 94f47067 PTE 0 > [32918.798974] Oops: 0000 [#1] SMP > [32918.799023] Modules linked in: xt_physdev xen_blkback xen_netback > ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables xen_gntdev > nfsv3 nfsv4 rpcsec_gss_krb5 nfsd nfs_acl auth_rpcgss oid_registry nfs fscache > dns_resolver lockd sunrpc fuse loop xen_blkfront xen_evtchn blktap quota_v2 > quota_tree xenfs xen_privcmd coretemp crc32c_intel ghash_clmulni_intel > aesni_intel ablk_helper cryptd lrw snd_pcm gf128mul snd_timer glue_helper snd > aes_x86_64 soundcore snd_page_alloc microcode tpm_tis tpm tpm_bios pcspkr > lpc_ich mfd_core acpi_power_meter i7core_edac mperf serio_raw i2c_i801 evdev > edac_core processor ioatdma thermal_sys ext4 jbd2 crc16 bonding bridge stp > llc dm_snapshot dm_mirror dm_region_hash dm_log dm_mod sd_mod crc_t10dif > hid_generic usbhid hid mptsas mptscsih mptbase scsi_transport_sas ehci_pci > button uhci_hcd ehci_hcd usbcore usb_common igb dca i2c_algo_bit i2c_core ptp > pps_core > [32918.799958] CPU: 0 PID: 6450 Comm: netback/0 Not tainted > 3.10.0-ucs58-amd64 #1 Debian 3.10.11-1.58.201405060908 > [32918.800050] Hardware name: FUJITSU PRIMERGY BX920 S2/D3030, BIOS 080015 > Rev.3D94.3030 10/09/2012 > [32918.800137] task: ffff880093864880 ti: ffff88009266c000 task.ti: > ffff88009266c000 > [32918.800220] RIP: e030:[<ffffffffa04287dc>] [<ffffffffa04287dc>] > xen_netbk_rx_action+0x18b/0x6f0 [xen_netback] > [32918.800314] RSP: e02b:ffff88009266dce8 EFLAGS: 00010212 > [32918.800364] RAX: ffffc9001082dac0 RBX: ffff880004d86ac0 RCX: > ffffc90010da2000 > [32918.800419] RDX: 0000000000000031 RSI: 0000000000000000 RDI: > ffff880004bdd280 > [32918.800474] RBP: ffff8800932db800 R08: 0000000000000000 R09: > ffff8800952f3800 > [32918.800529] R10: 0000000000007ff0 R11: ffff88009c611380 R12: > ffff8800932db800 > [32918.800584] R13: ffff88009266dd58 R14: ffffc90010821000 R15: > 0000000000000000 > [32918.800642] FS: 00007f2f8fdcd700(0000) GS:ffff88009c600000(0000) > knlGS:0000000000000000 > [32918.800728] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [32918.800778] CR2: ffffc90010da2188 CR3: 0000000093eb0000 CR4: > 0000000000002660 > [32918.800834] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [32918.800889] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [32918.800943] Stack: > [32918.800981] ffff880093864c60 000000008106d2af ffff88009c613ec0 > ffff88009c613ec0 > [32918.801077] 0000000093864880 ffffc90010828ac0 ffffc90010821020 > 000000009c613ec0 > [32918.801173] 0000000000000000 0000000000000001 ffffc90010828ac0 > ffffc9001082dac0 > [32918.801269] Call Trace: > [32918.801314] [<ffffffff813ca32d>] ? _raw_spin_lock_irqsave+0x11/0x2f > [32918.801368] [<ffffffffa042a033>] ? xen_netbk_kthread+0x174/0x841 > [xen_netback] > [32918.801454] [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20 > [32918.801504] [<ffffffffa0429ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 > [xen_netback] > [32918.801590] [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56 > [32918.801645] [<ffffffffa0429ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 > [xen_netback] > [32918.801730] [<ffffffff8105ce1e>] ? kthread+0xab/0xb3 > [32918.801781] [<ffffffff81003638>] ? xen_end_context_switch+0xe/0x1c > [32918.801834] [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56 > [32918.801890] [<ffffffff813cfbfc>] ? ret_from_fork+0x7c/0xb0 > [32918.801941] [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56 > [32918.801995] Code: 8b b3 d0 00 00 00 48 8b bb d8 00 00 00 0f b7 74 37 02 89 > 70 08 eb 07 c7 40 08 00 00 00 00 89 d2 c7 40 04 00 00 00 00 48 83 c2 08 <0f> > b7 34 d1 89 30 c7 44 24 60 00 00 00 00 8b 44 d1 04 89 44 24 > [32918.802400] RIP [<ffffffffa04287dc>] xen_netbk_rx_action+0x18b/0x6f0 > [xen_netback] > [32918.802486] RSP <ffff88009266dce8> > [32918.802529] CR2: ffffc90010da2188 > [32918.802859] ---[ end trace baf81e34c52eb41c ]--- (gdb) list *(xen_netbk_rx_action+0x18b) 0xffffffffa04287dc is in xen_netbk_rx_action (/var/build/temp/tmp.hW3dNilayw/pbuilder/linux-3.10.11/drivers/net/xen-netback/netback .c:611). 606 meta->gso_size = skb_shinfo(skb)->gso_size; 607 else 608 meta->gso_size = 0; 609 610 meta->size = 0; 611 meta->id = req->id; 612 npo->copy_off = 0; 613 npo->copy_gref = req->gref; 614 615 data = skb->data; After more debugging today I think something like this happens: 1. The VM is receiving packets through bonding + bridge + netback + netfront. 2. For some unknown reason at least one packet remains in the rx queue and is not delivered to the domU immediately by netback. 3. The VM finishes shutting down. 4. The shared ring between dom0 and domU is freed. 5. then xen-netback continues processing the pending requests and tries to put the packet into the now already released shared ring. From reading the attached disassembly I guess, that AX = &meta CX = &rx->string DX =~ rx.req_cons CR2 = &req->id where CX + DX * sizeof(union struct xen_netif_rx_{request,response})=8 = CR2 Any additional ideas or insight is appreciated. FYI: The host has only a single CPU and is running >=2 VMs so far. >> There's one more patch that you can pick up from 3.10.y tree. I doubt it >> will make much difference though. Which patch are you referring to? Sincerely Philipp Attachment:
xen-netback.s _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |