[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!



Hi all,

Another crash this morning:

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
------------[ cut here ]------------
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode: 0000 [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) xen_netback(E) 
nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) ipmi_si(E) 
ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) 
spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) i2c_piix4(E) 
i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) sg(E) raid1(E) 
sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) libahci(E) 
dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P           OE   
4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
task: ffff880059e255c0 task.stack: ffffc90001f64000
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:ffffc90001f67c68 EFLAGS: 00010292
RAX: 0000000000000045 RBX: ffffc90001f55000 RCX: 0000000000000000
RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
RBP: ffffc90001f67e98 R08: 0000000000000372 R09: 0000000000000373
R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90001f5e730
R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
FS:  00007f92865d29a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 000000006209c000 CR4: 0000000000000660
Call Trace:
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 ? error_exit+0x5/0x20
 ? __update_load_avg_cfs_rq+0x176/0x180
 ? xen_mc_flush+0x87/0x120
 ? xen_load_sp0+0x84/0xa0
 ? __switch_to+0x1c1/0x360
 ? finish_task_switch+0x78/0x240
 ? __schedule+0x192/0x496
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
 ? do_wait_intr+0x80/0x80
 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
 kthread+0x106/0x140
 ? kthread_destroy_worker+0x60/0x60
 ret_from_fork+0x25/0x30
Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 
c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 53 
20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31 
RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: ffffc90001f67c68
---[ end trace 130de0b7e39d0eea ]---

Best regards,

Alex



-----Original Message-----
From: Juergen Gross [mailto:jgross@xxxxxxxx] 
Sent: Friday, 22 December 2017 5:47 PM
To: Alex Braunegg; xen-devel@xxxxxxxxxxxxxxxxxxxx
Cc: Wei Liu; Paul Durrant
Subject: Re: [Xen-devel] [BUG] kernel bug encountered at 
drivers/net/xen-netback/netback.c:430!

On 22/12/17 07:40, Alex Braunegg wrote:
> Hi all,
> 
> Experienced the same issue again today:

Ccing the maintainers.


Juergen

> 
> ============================================================================
> =========
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P           OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff880062518000 task.stack: ffffc90004f88000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
> Call Trace:
>  ? error_exit+0x5/0x20
>  ? __update_load_avg_cfs_rq+0x176/0x180
>  ? xen_mc_flush+0x87/0x120
>  ? xen_load_sp0+0x84/0xa0
>  ? __switch_to+0x1c1/0x360
>  ? finish_task_switch+0x78/0x240
>  ? __schedule+0x192/0x496
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_unlock_irqrestore+0x11/0x20
>  xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>  ? do_wait_intr+0x80/0x80
>  ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>  kthread+0x106/0x140
>  ? kthread_destroy_worker+0x60/0x60
>  ? kthread_destroy_worker+0x60/0x60
>  ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31 
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> ffffc90004f8bc68
> ---[ end trace 010682c76619a1bd ]---
> 
> ============================================================================
> =========
> 
> Best regards,
> 
> Alex
> 
> -----Original Message-----
> From: Alex Braunegg [mailto:alex.braunegg@xxxxxxxxx] 
> Sent: Thursday, 21 December 2017 8:04 AM
> To: 'xen-devel@xxxxxxxxxxxxxxxxxxxx'
> Subject: [BUG] kernel bug encountered at
> drivers/net/xen-netback/netback.c:430!
> 
> Hi all,
> 
> I experienced the following bug whilst using a Xen VM. What happened was
> that this morning a single Xen VM suddenly terminated without cause with the
> following being logged in dmesg. 
> 
> Only 1 VM experienced an issue (out of 2 which were running), the other
> remained up and fully functional until I attempted to restart the crashed VM
> which triggered the kernel bug.
> 
> Kernel:       4.14.6
> Xen:          4.8.2
> 
> ============================================================================
> =========
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff8800595cc980 task.stack: ffffc900028e0000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
> Call Trace:
>  ? error_exit+0x5/0x20
>  ? __update_load_avg_cfs_rq+0x176/0x180
>  ? xen_mc_flush+0x87/0x120
>  ? xen_load_sp0+0x84/0xa0
>  ? __switch_to+0x1c1/0x360
>  ? finish_task_switch+0x78/0x240
>  ? __schedule+0x192/0x496
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_unlock_irqrestore+0x11/0x20
>  xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>  ? do_wait_intr+0x80/0x80
>  ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>  kthread+0x106/0x140
>  ? kthread_destroy_worker+0x60/0x60
>  ? kthread_destroy_worker+0x60/0x60
>  ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31 
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> ffffc900028e3c68
> ---[ end trace 7d827dae67002ffc ]---
> 
> ============================================================================
> =========
> 
> The section of relevant kernel code is:
> 
> ============================================================================
> =========
> 
> static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
>                                              u16 pending_idx)
> {
>         if (unlikely(queue->grant_tx_handle[pending_idx] ==
>                      NETBACK_INVALID_HANDLE)) {
>                 netdev_err(queue->vif->dev,
>                            "Trying to unmap invalid handle! pending_idx:
> 0x%x\n",
>                            pending_idx);
>                 BUG();
>         }
>         queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
> }
> 
> ============================================================================
> =========
> 
> In an attempt to recover from this situation I restarted / destroyed (xl
> restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
> following error messages were logged at the console:
> 
> ============================================================================
> =========
> 
> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
> /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
> fault
> libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
> device with path /local/domain/0/backend/vif/2/0
> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
> for 2
> 
> ============================================================================
> =========
> 
> After which the physical system hung, then the physical system restarted
> with nothing else logged and everything came back OK & operational including
> the VM that crashed.
> 
> Further details (xl dmesg, xl info) attached.
> 
> Best regards,
> 
> Alex Braunegg
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.