Hi Xen User List,
I have a recurring bug shown below which is causing me serious problems. It’s so serious my company is considering abandoning XEN for KVM but, we have many XEN servers and VMs so this is not an easy decision. My CentOS6 Xen servers are built using SolusVM install script and run Windows 2008 R2 / 2012 R1 guests. They run great except for this recurring bug.
The servers are Xen 4.4.2 (and some slightly earlier versions) using XL Toolstack and we’ve been encountering this showstopper bug for the last 6 months or so randomly on different servers. When this BUG is encountered my VM guests will no longer be able to boot if they are restarted. I am forced to reboot the entire server interrupting all VM guests. I’ve had about 6-7 encounters with this bug. One time it hard crashed the server while I was migrating a VM to another server. The other times I have made it to the weekend so I can do a graceful reboot.
Does anyone know of this issue?
Is there any paid Xen support that can debug and solve this? I’m using CentOS6 so this is not a RHEL box under support contract but, we’d probably be willing to pay in a per incident type scenario if the cost was reasonable.
Any advice is much appreciated.
Jul 1 12:45:23 london-host15 kernel: xen-blkback:backend/vbd/86/768: prepare for reconnect
Jul 1 12:45:23 london-host15 kernel: br0: port 8(vifvm2696.0) entered disabled state
Jul 1 12:45:23 london-host15 kernel: BUG: unable to handle kernel paging request at ffffc90011a041e8
Jul 1 12:45:23 london-host15 kernel: IP: [<ffffffffa026e899>] netbk_gop_skb+0xb9/0x290 [xen_netback]
Jul 1 12:45:23 london-host15 kernel: PGD 38cee067 PUD 38cef067 PMD 25e89067 PTE 0
Jul 1 12:45:23 london-host15 kernel: Oops: 0000 [#1] SMP
Jul 1 12:45:23 london-host15 kernel: Modules linked in: dm_snapshot ebt_arp ebt_ip ebtable_filter ebtables tun xen_pciback xen_gntalloc bridge stp llc xt_REDIRECT xt_owner nf_nat_ftp nf_conntrack_ftp xt_length xt_hl xt_tcpmss xt_TCPMSS xt_multiport xt_limit xt_LOG xt_DSCP xt_dscp ipt_REJECT iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd gpio_ich iTCO_wdt iTCO_vendor_support joydev coretemp freq_table mperf crc32_pclmul crc32c_intel ghash_clmulni_intel cryptd microcode pcspkr sg i2c_i801 lpc_ich igb ptp pps_core shpchp ipmi_devintf ipmi_si ipmi_msghandler ioatdma dca acpi_power_meter hwmon ext4 jbd2 mbcache sd_mod crc_t10dif xhci_hcd ahci libahci wmi ttm drm_kms_helper dm_mirror dm_region_hash dm_log dm_mod
Jul 1 12:45:23 london-host15 kernel: CPU: 2 PID: 940 Comm: netback/2 Not tainted 3.10.68-11.el6.centos.alt.x86_64 #1
Jul 1 12:45:23 london-host15 kernel: Hardware name: Supermicro X10DRL-i/X10DRL-i, BIOS 1.0b 08/28/2014
Jul 1 12:45:23 london-host15 kernel: task: ffff88003535ecb0 ti: ffff88003526a000 task.ti: ffff88003526a000
Jul 1 12:45:23 london-host15 kernel: RIP: e030:[<ffffffffa026e899>] [<ffffffffa026e899>] netbk_gop_skb+0xb9/0x290 [xen_netback]
Jul 1 12:45:23 london-host15 kernel: RSP: e02b:ffff88003526bcd8 EFLAGS: 00010202
Jul 1 12:45:23 london-host15 kernel: RAX: ffffc900105aa040 RBX: ffff880034ce9280 RCX: ffffc90011a04000
Jul 1 12:45:23 london-host15 kernel: RDX: 000000000000003d RSI: 0000000000000000 RDI: ffff8800059e5000
Jul 1 12:45:23 london-host15 kernel: RBP: ffff88003526bd48 R08: 0000000000000000 R09: 0000000000000000
Jul 1 12:45:23 london-host15 kernel: R10: 0000000000007ff0 R11: 0000000000000002 R12: 0000000000000000
Jul 1 12:45:23 london-host15 kernel: R13: 0000000000000000 R14: ffff88003526bd98 R15: ffff880009e1f800
Jul 1 12:45:23 london-host15 kernel: FS: 0000000000000000(0000) GS:ffff88003f880000(0000) knlGS:0000000000000000
Jul 1 12:45:23 london-host15 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 1 12:45:23 london-host15 kernel: CR2: ffffc90011a041e8 CR3: 000000000486f000 CR4: 0000000000042660
Jul 1 12:45:23 london-host15 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 1 12:45:23 london-host15 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 1 12:45:23 london-host15 kernel: Stack:
Jul 1 12:45:23 london-host15 kernel: 0000000000000000 0000000000000002 000000003526bd08 ffffffff8100392e
Jul 1 12:45:23 london-host15 kernel: 0000000000000000 000000003535ecb0 ffff88003526bd78 00000001815f9bd7
Jul 1 12:45:23 london-host15 kernel: ffff88003526bd48 ffff880034ce9280 0000000000000000 0000000000000000
Jul 1 12:45:23 london-host15 kernel: Call Trace:
Jul 1 12:45:23 london-host15 kernel: [<ffffffff8100392e>] ? xen_end_context_switch+0x1e/0x30
Jul 1 12:45:23 london-host15 kernel: [<ffffffffa026eb55>] xen_netbk_rx_action+0xe5/0x600 [xen_netback]
Jul 1 12:45:23 london-host15 kernel: [<ffffffff815f9bd7>] ? _raw_spin_unlock_irqrestore+0x17/0x20
Jul 1 12:45:23 london-host15 kernel: [<ffffffffa0270df0>] xen_netbk_kthread+0x80/0x1b0 [xen_netback]
Jul 1 12:45:23 london-host15 kernel: [<ffffffff810828d0>] ? wake_up_bit+0x40/0x40
Jul 1 12:45:23 london-host15 kernel: [<ffffffffa0270d70>] ? xen_netbk_tx_build_gops+0x8b0/0x8b0 [xen_netback]
Jul 1 12:45:23 london-host15 kernel: [<ffffffff810820be>] kthread+0xce/0xe0
Jul 1 12:45:23 london-host15 kernel: [<ffffffff81081ff0>] ? kthread_freezable_should_stop+0x70/0x70
Jul 1 12:45:23 london-host15 kernel: [<ffffffff81602cec>] ret_from_fork+0x7c/0xb0
Jul 1 12:45:23 london-host15 kernel: [<ffffffff81081ff0>] ? kthread_freezable_should_stop+0x70/0x70
Jul 1 12:45:23 london-host15 kernel: Code: 47 60 04 0f 85 89 01 00 00 8b b3 d0 00 00 00 48 8b bb d8 00 00 00 0f b7 74 37 02 89 70 08 89 d2 c7 40 04 00 00 00 00 48 83 c2 08 <0f> b7 34 d1 89 30 41 c7 46 20 00 00 00 00 8b 44 d1 04 41 89 46
Jul 1 12:45:23 london-host15 kernel: RIP [<ffffffffa026e899>] netbk_gop_skb+0xb9/0x290 [xen_netback]
Jul 1 12:45:23 london-host15 kernel: RSP <ffff88003526bcd8>
Jul 1 12:45:23 london-host15 kernel: CR2: ffffc90011a041e8
Jul 1 12:45:23 london-host15 kernel: ---[ end trace 0926a1200e28f127 ]---
Jul 1 12:45:23 london-host15 kernel: device vifvm2696.0 left promiscuous mode
Jul 1 12:45:23 london-host15 kernel: br0: port 8(vifvm2696.0) entered disabled state
--------------
[root@london-host15 ~]# xl info
host : london-host15.domain.net
release : 3.10.68-11.el6.centos.alt.x86_64
version : #1 SMP Fri Feb 6 10:40:16 CST 2015
machine : x86_64
nr_cpus : 6
max_cpu_id : 5
nr_nodes : 1
cores_per_socket : 6
threads_per_core : 1
cpu_mhz : 2400
hw_caps : bfebfbff:2c100800:00000000:00007f00:75fefbff:00000000:00000021:000037ab
virt_caps : hvm hvm_directio
total_memory : 32661
free_memory : 15374
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 4
xen_extra : .2-2.el6
xen_version : 4.4.2-2.el6
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : Thu Apr 23 15:06:13 2015 +0100 git:82363f6-dirty
xen_commandline : dom0_mem=1024M,max:1536M loglvl=all guest_loglvl=all
cc_compiler : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)
cc_compile_by : mockbuild
cc_compile_domain : centos.org
cc_compile_date : Wed May 13 12:28:01 UTC 2015
xend_config_format : 4
[root@london-host15 ~]#