Xen project Mailing List

[Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

From: "Alex Braunegg" <alex.braunegg@xxxxxxxxx>

Date: Thu, 21 Dec 2017 08:03:35 +1100

Delivery-date: Wed, 20 Dec 2017 21:04:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AdN4TWjsFFzt17FkRxCyARlShM/+zQ==

Hi all, I experienced the following bug whilst using a Xen VM. What happened was that this morning a single Xen VM suddenly terminated without cause with the following being logged in dmesg. Only 1 VM experienced an issue (out of 2 which were running), the other remained up and fully functional until I attempted to restart the crashed VM which triggered the kernel bug. Kernel: 4.14.6 Xen: 4.8.2 ============================================================================ ========= vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f ------------[ cut here ]------------ kernel BUG at drivers/net/xen-netback/netback.c:430! invalid opcode: 0000 [#1] SMP Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E) i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E) CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P OE 4.14.6-1.el6.x86_64 #1 Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 task: ffff8800595cc980 task.stack: ffffc900028e0000 RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292 RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000 RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38 RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730 R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8 FS: 00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660 Call Trace: ? error_exit+0x5/0x20 ? __update_load_avg_cfs_rq+0x176/0x180 ? xen_mc_flush+0x87/0x120 ? xen_load_sp0+0x84/0xa0 ? __switch_to+0x1c1/0x360 ? finish_task_switch+0x78/0x240 ? __schedule+0x192/0x496 ? _raw_spin_lock_irqsave+0x1a/0x3c ? _raw_spin_lock_irqsave+0x1a/0x3c ? _raw_spin_unlock_irqrestore+0x11/0x20 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback] ? do_wait_intr+0x80/0x80 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback] kthread+0x106/0x140 ? kthread_destroy_worker+0x60/0x60 ? kthread_destroy_worker+0x60/0x60 ret_from_fork+0x25/0x30 Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31 RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: ffffc900028e3c68 ---[ end trace 7d827dae67002ffc ]--- ============================================================================ ========= The section of relevant kernel code is: ============================================================================ ========= static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue, u16 pending_idx) { if (unlikely(queue->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE)) { netdev_err(queue->vif->dev, "Trying to unmap invalid handle! pending_idx: 0x%x\n", pending_idx); BUG(); } queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE; } ============================================================================ ========= In an attempt to recover from this situation I restarted / destroyed (xl restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the following error messages were logged at the console: ============================================================================ ========= libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation fault libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove device with path /local/domain/0/backend/vif/2/0 libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed for 2 ============================================================================ ========= After which the physical system hung, then the physical system restarted with nothing else logged and everything came back OK & operational including the VM that crashed. Further details (xl dmesg, xl info) attached. Best regards, Alex Braunegg

Attachment: xl-dmesg.txt
Description: Text document

Attachment: xl-info.txt
Description: Text document

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.