[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] kernel BUG at block/bio.c:1786 -- (xen_blkif_schedule on the stack)

Den 2017-02-09 18:30, skrev Roger Pau Monné:
On Mon, Feb 06, 2017 at 12:31:20AM +0100, Håkon Alstadheim wrote:
I get the BUG below in dom0 when trying to start a windows 10 domu (hvm, with some pv-drivers installed ) . Below is "xl info", then comes dmesg
output, and finally domu config attached at end.

This domain is started very rarely, so may have been broken for some
time. All my other domains ar linux. This message is just a data-point
for whoever is interested, with possibly more data if anybody wants to
ask me anything. NOT expecting quick resolution of this :-/ .

The domain boots part of the way, screen resolution gets changed and
then it keeps spinning for ~ 5 seconds before stopping.
[339809.663061] br0: port 12(vif7.0) entered blocking state
[339809.663063] br0: port 12(vif7.0) entered disabled state
[339809.663123] device vif7.0 entered promiscuous mode
[339809.664885] IPv6: ADDRCONF(NETDEV_UP): vif7.0: link is not ready
[339809.742522] br0: port 13(vif7.0-emu) entered blocking state
[339809.742523] br0: port 13(vif7.0-emu) entered disabled state
[339809.742573] device vif7.0-emu entered promiscuous mode
[339809.744386] br0: port 13(vif7.0-emu) entered blocking state
[339809.744388] br0: port 13(vif7.0-emu) entered forwarding state
[339864.059095] xen-blkback: backend/vbd/7/768: prepare for reconnect
[339864.138002] xen-blkback: backend/vbd/7/768: using 1 queues, protocol
1 (x86_64-abi)
[339864.241039] xen-blkback: backend/vbd/7/832: prepare for reconnect
[339864.337997] xen-blkback: backend/vbd/7/832: using 1 queues, protocol
1 (x86_64-abi)
[339875.245306] vif vif-7-0 vif7.0: Guest Rx ready
[339875.245345] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: link becomes ready
[339875.245391] br0: port 12(vif7.0) entered blocking state
[339875.245395] br0: port 12(vif7.0) entered forwarding state
[339894.122151] ------------[ cut here ]------------
[339894.122169] kernel BUG at block/bio.c:1786!
[339894.122173] invalid opcode: 0000 [#1] SMP
[339894.122176] Modules linked in: xt_physdev iptable_filter ip_tables
x_tables nfsd auth_rpcgss oid_registry nfsv4 dns_resolver nfsv3 nfs_acl
binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
crc32c_intel pcspkr serio_raw i2c_i801 i2c_smbus iTCO_wdt
iTCO_vendor_support amdgpu drm_kms_helper syscopyarea bcache input_leds sysfillrect sysimgblt fb_sys_fops ttm drm uas shpchp ipmi_ssif rtc_cmos
acpi_power_meter wmi tun snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd
usbip_host usbip_core pktcdvd tmem lpc_ich xen_wdt nct6775 hwmon_vid
dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_service_time
dm_round_robin dm_queue_length dm_multipath dm_log_userspace cn
virtio_pci virtio_scsi virtio_blk virtio_console virtio_balloon
[339894.122233]  xts gf128mul aes_x86_64 cbc sha512_generic
sha256_generic sha1_generic libiscsi scsi_transport_iscsi virtio_net
virtio_ring virtio tg3 libphy e1000 fuse overlay nfs lockd grace sunrpc
jfs multipath linear raid10 raid1 raid0 dm_raid raid456
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod
hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey
hid_microsoft hid_logitech ff_memless hid_gyration hid_ezkey hid_cypress
hid_chicony hid_cherry hid_a4tech sl811_hcd xhci_plat_hcd ohci_pci
ohci_hcd uhci_hcd aic94xx lpfc qla2xxx aacraid sx8 DAC960 hpsa cciss
3w_9xxx 3w_xxxx mptsas mptfc scsi_transport_fc mptspi mptscsih mptbase
atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth initio BusLogic
[339894.122325]  arcmsr aic7xxx aic79xx sg pdc_adma sata_inic162x
sata_mv sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_sis usbhid led_class igb
ptp dca i2c_algo_bit ehci_pci ehci_hcd xhci_pci megaraid_sas xhci_hcd
[339894.122350] CPU: 3 PID: 23514 Comm: 7.hda-0 Tainted: G        W
 4.9.8-gentoo #1
[339894.122353] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D8
WS/Z10PE-D8 WS, BIOS 3304 06/22/2016
[339894.122358] task: ffff880244b55b00 task.stack: ffffc90042fcc000
[339894.122361] RIP: e030:[<ffffffff813c6af7>]  [<ffffffff813c6af7>]
[339894.122370] RSP: e02b:ffffc90042fcfb18  EFLAGS: 00010246
[339894.122373] RAX: 00000000000000a8 RBX: ffff8802433ee900 RCX:
[339894.122377] RDX: 0000000002400000 RSI: 0000000000000000 RDI:
[339894.122380] RBP: ffffc90042fcfba8 R08: 0000000000000000 R09:
[339894.122383] R10: 0000000000000002 R11: 0005803fffffffff R12:
[339894.122387] R13: 00000000000000a8 R14: ffffc90042fcfbb8 R15:
[339894.122394] FS:  0000000000000000(0000) GS:ffff8802498c0000(0000)
[339894.122398] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[339894.122401] CR2: 00007f99b78e3349 CR3: 0000000216d43000 CR4:
[339894.122405] Stack:
[339894.122407]  ffffffff813d1bce 0000000000000002 ffffc90042fcfb50
[339894.122413]  0000000000000002 0000000100000000 0000000000000000
[339894.122419]  0000000000000000 000000000d2ee022 0000000200006fec
[339894.122424] Call Trace:
[339894.122429]  [<ffffffff813d1bce>] ? blk_queue_split+0x448/0x48b
[339894.122435]  [<ffffffff813cd7f3>] blk_queue_bio+0x44/0x289
[339894.122439]  [<ffffffff813cc226>] generic_make_request+0xbd/0x160
[339894.122443]  [<ffffffff813cc3c9>] submit_bio+0x100/0x11d
[339894.122446]  [<ffffffff813d2b8a>] ? next_bio+0x1d/0x40
[339894.122450]  [<ffffffff813c4d10>] submit_bio_wait+0x4e/0x62
[339894.122454]  [<ffffffff813d2df3>] blkdev_issue_discard+0x71/0xa9
[339894.122459]  [<ffffffff81534fd4>] __do_block_io_op+0x4f0/0x579
[339894.122463]  [<ffffffff81534fd4>] ? __do_block_io_op+0x4f0/0x579
[339894.122469]  [<ffffffff81770005>] ? sha_transform+0xf47/0x1069
[339894.122474]  [<ffffffff81535544>] xen_blkif_schedule+0x318/0x63c
[339894.122478]  [<ffffffff81777498>] ? __schedule+0x32e/0x4e8
[339894.122484]  [<ffffffff81088f9b>] ? wake_up_atomic_t+0x2c/0x2c
[339894.122488]  [<ffffffff8153522c>] ? xen_blkif_be_int+0x2c/0x2c
[339894.122492]  [<ffffffff810742aa>] kthread+0xa6/0xae
[339894.122496]  [<ffffffff81074204>] ? init_completion+0x24/0x24
[339894.122501]  [<ffffffff8177a335>] ret_from_fork+0x25/0x30

Are you using some kind of software RAID or similar backend for the disk

Yes, I'm using bcache over an lvm volume over md-raid (raid-6) on some sas-drives attached to an LSI megaraid 2008 as JBOD, all in dom0, the bcache volume gets passed to the VM.

It looks like someone (not blkback) is trying to split a discard bio
(or maybe even a discard bio with 0 sectors), and that's causing a BUG to trigger. TBH, I would expect blkdev_issue_discard to either ignore or reject such requests, but it doesn't seem to do so (or at least I cannot find it).

Could you try the below patch and report back what output do you get?

I have since found that disabling tmem seems to work around this issue, as well as another issue viz: no youtube on my linux-desktop-vm, which has display on an amd graphics card which gets passed through.

I'll apply the patch and try to get it tested in the next couple of days.

Thanks, Roger.

Thank you for taking an interest :-) .

diff --git a/drivers/block/xen-blkback/blkback.c
index 726c32e..1964e9c 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1027,6 +1027,8 @@ static int dispatch_discard_io(struct
xen_blkif_ring *ring,
                 (req->u.discard.flag & BLKIF_DISCARD_SECURE)) ?
                 BLKDEV_DISCARD_SECURE : 0;

+       pr_info("Sending discard, sector %llu nr %llu\n",
+               req->u.discard.sector_number, req->u.discard.nr_sectors);
        err = blkdev_issue_discard(bdev, req->u.discard.sector_number,
                                   GFP_KERNEL, secure);

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.