[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] kernel BUG at block/bio.c:1786 -- (xen_blkif_schedule on the stack)



On Mon, Feb 06, 2017 at 12:31:20AM +0100, Håkon Alstadheim wrote:
> I get the BUG below in dom0 when trying to start a windows 10 domu (hvm,
> with some pv-drivers installed ) . Below is "xl info", then comes dmesg
> output, and finally domu config attached at end.
> 
> This domain is started very rarely, so may have been broken for some
> time. All my other domains ar linux. This message is just a data-point
> for whoever is interested, with possibly more data if anybody wants to
> ask me anything. NOT expecting quick resolution of this :-/ .
> 
> The domain boots part of the way, screen resolution gets changed and
> then it keeps spinning for ~ 5 seconds before stopping.
[...]
> [339809.663061] br0: port 12(vif7.0) entered blocking state
> [339809.663063] br0: port 12(vif7.0) entered disabled state
> [339809.663123] device vif7.0 entered promiscuous mode
> [339809.664885] IPv6: ADDRCONF(NETDEV_UP): vif7.0: link is not ready
> [339809.742522] br0: port 13(vif7.0-emu) entered blocking state
> [339809.742523] br0: port 13(vif7.0-emu) entered disabled state
> [339809.742573] device vif7.0-emu entered promiscuous mode
> [339809.744386] br0: port 13(vif7.0-emu) entered blocking state
> [339809.744388] br0: port 13(vif7.0-emu) entered forwarding state
> [339864.059095] xen-blkback: backend/vbd/7/768: prepare for reconnect
> [339864.138002] xen-blkback: backend/vbd/7/768: using 1 queues, protocol
> 1 (x86_64-abi)
> [339864.241039] xen-blkback: backend/vbd/7/832: prepare for reconnect
> [339864.337997] xen-blkback: backend/vbd/7/832: using 1 queues, protocol
> 1 (x86_64-abi)
> [339875.245306] vif vif-7-0 vif7.0: Guest Rx ready
> [339875.245345] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: link becomes ready
> [339875.245391] br0: port 12(vif7.0) entered blocking state
> [339875.245395] br0: port 12(vif7.0) entered forwarding state
> [339894.122151] ------------[ cut here ]------------
> [339894.122169] kernel BUG at block/bio.c:1786!
> [339894.122173] invalid opcode: 0000 [#1] SMP
> [339894.122176] Modules linked in: xt_physdev iptable_filter ip_tables
> x_tables nfsd auth_rpcgss oid_registry nfsv4 dns_resolver nfsv3 nfs_acl
> binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
> crc32c_intel pcspkr serio_raw i2c_i801 i2c_smbus iTCO_wdt
> iTCO_vendor_support amdgpu drm_kms_helper syscopyarea bcache input_leds
> sysfillrect sysimgblt fb_sys_fops ttm drm uas shpchp ipmi_ssif rtc_cmos
> acpi_power_meter wmi tun snd_hda_codec_realtek snd_hda_codec_generic
> snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd
> usbip_host usbip_core pktcdvd tmem lpc_ich xen_wdt nct6775 hwmon_vid
> dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_service_time
> dm_round_robin dm_queue_length dm_multipath dm_log_userspace cn
> virtio_pci virtio_scsi virtio_blk virtio_console virtio_balloon
> [339894.122233]  xts gf128mul aes_x86_64 cbc sha512_generic
> sha256_generic sha1_generic libiscsi scsi_transport_iscsi virtio_net
> virtio_ring virtio tg3 libphy e1000 fuse overlay nfs lockd grace sunrpc
> jfs multipath linear raid10 raid1 raid0 dm_raid raid456
> async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
> dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod
> hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey
> hid_microsoft hid_logitech ff_memless hid_gyration hid_ezkey hid_cypress
> hid_chicony hid_cherry hid_a4tech sl811_hcd xhci_plat_hcd ohci_pci
> ohci_hcd uhci_hcd aic94xx lpfc qla2xxx aacraid sx8 DAC960 hpsa cciss
> 3w_9xxx 3w_xxxx mptsas mptfc scsi_transport_fc mptspi mptscsih mptbase
> atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth initio BusLogic
> [339894.122325]  arcmsr aic7xxx aic79xx sg pdc_adma sata_inic162x
> sata_mv sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via
> sata_svw sata_sil24 sata_sil sata_promise pata_sis usbhid led_class igb
> ptp dca i2c_algo_bit ehci_pci ehci_hcd xhci_pci megaraid_sas xhci_hcd
> [339894.122350] CPU: 3 PID: 23514 Comm: 7.hda-0 Tainted: G        W
>  4.9.8-gentoo #1
> [339894.122353] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D8
> WS/Z10PE-D8 WS, BIOS 3304 06/22/2016
> [339894.122358] task: ffff880244b55b00 task.stack: ffffc90042fcc000
> [339894.122361] RIP: e030:[<ffffffff813c6af7>]  [<ffffffff813c6af7>]
> bio_split+0x9/0x89
> [339894.122370] RSP: e02b:ffffc90042fcfb18  EFLAGS: 00010246
> [339894.122373] RAX: 00000000000000a8 RBX: ffff8802433ee900 RCX:
> ffff88023f537080
> [339894.122377] RDX: 0000000002400000 RSI: 0000000000000000 RDI:
> ffff8801fc8b7890
> [339894.122380] RBP: ffffc90042fcfba8 R08: 0000000000000000 R09:
> 00000000000052da
> [339894.122383] R10: 0000000000000002 R11: 0005803fffffffff R12:
> ffff8801fc8b7890
> [339894.122387] R13: 00000000000000a8 R14: ffffc90042fcfbb8 R15:
> 0000000000000000
> [339894.122394] FS:  0000000000000000(0000) GS:ffff8802498c0000(0000)
> knlGS:ffff8802498c0000
> [339894.122398] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [339894.122401] CR2: 00007f99b78e3349 CR3: 0000000216d43000 CR4:
> 0000000000042660
> [339894.122405] Stack:
> [339894.122407]  ffffffff813d1bce 0000000000000002 ffffc90042fcfb50
> ffff88023f537080
> [339894.122413]  0000000000000002 0000000100000000 0000000000000000
> 0000000100000000
> [339894.122419]  0000000000000000 000000000d2ee022 0000000200006fec
> 0000000000000000
> [339894.122424] Call Trace:
> [339894.122429]  [<ffffffff813d1bce>] ? blk_queue_split+0x448/0x48b
> [339894.122435]  [<ffffffff813cd7f3>] blk_queue_bio+0x44/0x289
> [339894.122439]  [<ffffffff813cc226>] generic_make_request+0xbd/0x160
> [339894.122443]  [<ffffffff813cc3c9>] submit_bio+0x100/0x11d
> [339894.122446]  [<ffffffff813d2b8a>] ? next_bio+0x1d/0x40
> [339894.122450]  [<ffffffff813c4d10>] submit_bio_wait+0x4e/0x62
> [339894.122454]  [<ffffffff813d2df3>] blkdev_issue_discard+0x71/0xa9
> [339894.122459]  [<ffffffff81534fd4>] __do_block_io_op+0x4f0/0x579
> [339894.122463]  [<ffffffff81534fd4>] ? __do_block_io_op+0x4f0/0x579
> [339894.122469]  [<ffffffff81770005>] ? sha_transform+0xf47/0x1069
> [339894.122474]  [<ffffffff81535544>] xen_blkif_schedule+0x318/0x63c
> [339894.122478]  [<ffffffff81777498>] ? __schedule+0x32e/0x4e8
> [339894.122484]  [<ffffffff81088f9b>] ? wake_up_atomic_t+0x2c/0x2c
> [339894.122488]  [<ffffffff8153522c>] ? xen_blkif_be_int+0x2c/0x2c
> [339894.122492]  [<ffffffff810742aa>] kthread+0xa6/0xae
> [339894.122496]  [<ffffffff81074204>] ? init_completion+0x24/0x24
> [339894.122501]  [<ffffffff8177a335>] ret_from_fork+0x25/0x30

Are you using some kind of software RAID or similar backend for the disk
images? It looks like someone (not blkback) is trying to split a discard bio
(or maybe even a discard bio with 0 sectors), and that's causing a BUG to
trigger. TBH, I would expect blkdev_issue_discard to either ignore or reject
such requests, but it doesn't seem to do so (or at least I cannot find it).

Could you try the below patch and report back what output do you get?

Thanks, Roger.

---8<---
diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 726c32e..1964e9c 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1027,6 +1027,8 @@ static int dispatch_discard_io(struct xen_blkif_ring 
*ring,
                 (req->u.discard.flag & BLKIF_DISCARD_SECURE)) ?
                 BLKDEV_DISCARD_SECURE : 0;
 
+       pr_info("Sending discard, sector %llu nr %llu\n",
+               req->u.discard.sector_number, req->u.discard.nr_sectors);
        err = blkdev_issue_discard(bdev, req->u.discard.sector_number,
                                   req->u.discard.nr_sectors,
                                   GFP_KERNEL, secure);


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.