[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] kernel BUG at block/bio.c:1786 -- (xen_blkif_schedule on the stack)



I just tried to provoke the bug, after applying your patch and
re-enabling tmem, but it seems there are more variables in the equation
to make a crash happen. Before this week the VM in question would
reliably crash/hang on boot during the past month and through several
re-boots of the dom0.

I have sligthly reduced memory allotment to several VMs, which might be
keeping the bug from triggering. I will not be actively trying to
provoke this any more, but I'll keep you posted if it re-surfaces.

In the mean-time I'll try to learn more about how my system uses memory
(looking into "grants").

Den 09. feb. 2017 18:30, skrev Roger Pau Monné:
> On Mon, Feb 06, 2017 at 12:31:20AM +0100, Håkon Alstadheim wrote:
>> I get the BUG below in dom0 when trying to start a windows 10 domu (hvm,
>> with some pv-drivers installed ) . Below is "xl info", then comes dmesg
>> output, and finally domu config attached at end.
>>
>> This domain is started very rarely, so may have been broken for some
>> time. All my other domains ar linux. This message is just a data-point
>> for whoever is interested, with possibly more data if anybody wants to
>> ask me anything. NOT expecting quick resolution of this :-/ .
>>
>> The domain boots part of the way, screen resolution gets changed and
>> then it keeps spinning for ~ 5 seconds before stopping.
> [...]
>> [339809.663061] br0: port 12(vif7.0) entered blocking state
>> [339809.663063] br0: port 12(vif7.0) entered disabled state
>> [339809.663123] device vif7.0 entered promiscuous mode
>> [339809.664885] IPv6: ADDRCONF(NETDEV_UP): vif7.0: link is not ready
>> [339809.742522] br0: port 13(vif7.0-emu) entered blocking state
>> [339809.742523] br0: port 13(vif7.0-emu) entered disabled state
>> [339809.742573] device vif7.0-emu entered promiscuous mode
>> [339809.744386] br0: port 13(vif7.0-emu) entered blocking state
>> [339809.744388] br0: port 13(vif7.0-emu) entered forwarding state
>> [339864.059095] xen-blkback: backend/vbd/7/768: prepare for reconnect
>> [339864.138002] xen-blkback: backend/vbd/7/768: using 1 queues, protocol
>> 1 (x86_64-abi)
>> [339864.241039] xen-blkback: backend/vbd/7/832: prepare for reconnect
>> [339864.337997] xen-blkback: backend/vbd/7/832: using 1 queues, protocol
>> 1 (x86_64-abi)
>> [339875.245306] vif vif-7-0 vif7.0: Guest Rx ready
>> [339875.245345] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: link becomes ready
>> [339875.245391] br0: port 12(vif7.0) entered blocking state
>> [339875.245395] br0: port 12(vif7.0) entered forwarding state
>> [339894.122151] ------------[ cut here ]------------
>> [339894.122169] kernel BUG at block/bio.c:1786!
>> [339894.122173] invalid opcode: 0000 [#1] SMP
>> [339894.122176] Modules linked in: xt_physdev iptable_filter ip_tables
>> x_tables nfsd auth_rpcgss oid_registry nfsv4 dns_resolver nfsv3 nfs_acl
>> binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
>> crc32c_intel pcspkr serio_raw i2c_i801 i2c_smbus iTCO_wdt
>> iTCO_vendor_support amdgpu drm_kms_helper syscopyarea bcache input_leds
>> sysfillrect sysimgblt fb_sys_fops ttm drm uas shpchp ipmi_ssif rtc_cmos
>> acpi_power_meter wmi tun snd_hda_codec_realtek snd_hda_codec_generic
>> snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd
>> usbip_host usbip_core pktcdvd tmem lpc_ich xen_wdt nct6775 hwmon_vid
>> dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_service_time
>> dm_round_robin dm_queue_length dm_multipath dm_log_userspace cn
>> virtio_pci virtio_scsi virtio_blk virtio_console virtio_balloon
>> [339894.122233]  xts gf128mul aes_x86_64 cbc sha512_generic
>> sha256_generic sha1_generic libiscsi scsi_transport_iscsi virtio_net
>> virtio_ring virtio tg3 libphy e1000 fuse overlay nfs lockd grace sunrpc
>> jfs multipath linear raid10 raid1 raid0 dm_raid raid456
>> async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
>> dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod
>> hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey
>> hid_microsoft hid_logitech ff_memless hid_gyration hid_ezkey hid_cypress
>> hid_chicony hid_cherry hid_a4tech sl811_hcd xhci_plat_hcd ohci_pci
>> ohci_hcd uhci_hcd aic94xx lpfc qla2xxx aacraid sx8 DAC960 hpsa cciss
>> 3w_9xxx 3w_xxxx mptsas mptfc scsi_transport_fc mptspi mptscsih mptbase
>> atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth initio BusLogic
>> [339894.122325]  arcmsr aic7xxx aic79xx sg pdc_adma sata_inic162x
>> sata_mv sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via
>> sata_svw sata_sil24 sata_sil sata_promise pata_sis usbhid led_class igb
>> ptp dca i2c_algo_bit ehci_pci ehci_hcd xhci_pci megaraid_sas xhci_hcd
>> [339894.122350] CPU: 3 PID: 23514 Comm: 7.hda-0 Tainted: G        W
>>  4.9.8-gentoo #1
>> [339894.122353] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D8
>> WS/Z10PE-D8 WS, BIOS 3304 06/22/2016
>> [339894.122358] task: ffff880244b55b00 task.stack: ffffc90042fcc000
>> [339894.122361] RIP: e030:[<ffffffff813c6af7>]  [<ffffffff813c6af7>]
>> bio_split+0x9/0x89
>> [339894.122370] RSP: e02b:ffffc90042fcfb18  EFLAGS: 00010246
>> [339894.122373] RAX: 00000000000000a8 RBX: ffff8802433ee900 RCX:
>> ffff88023f537080
>> [339894.122377] RDX: 0000000002400000 RSI: 0000000000000000 RDI:
>> ffff8801fc8b7890
>> [339894.122380] RBP: ffffc90042fcfba8 R08: 0000000000000000 R09:
>> 00000000000052da
>> [339894.122383] R10: 0000000000000002 R11: 0005803fffffffff R12:
>> ffff8801fc8b7890
>> [339894.122387] R13: 00000000000000a8 R14: ffffc90042fcfbb8 R15:
>> 0000000000000000
>> [339894.122394] FS:  0000000000000000(0000) GS:ffff8802498c0000(0000)
>> knlGS:ffff8802498c0000
>> [339894.122398] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [339894.122401] CR2: 00007f99b78e3349 CR3: 0000000216d43000 CR4:
>> 0000000000042660
>> [339894.122405] Stack:
>> [339894.122407]  ffffffff813d1bce 0000000000000002 ffffc90042fcfb50
>> ffff88023f537080
>> [339894.122413]  0000000000000002 0000000100000000 0000000000000000
>> 0000000100000000
>> [339894.122419]  0000000000000000 000000000d2ee022 0000000200006fec
>> 0000000000000000
>> [339894.122424] Call Trace:
>> [339894.122429]  [<ffffffff813d1bce>] ? blk_queue_split+0x448/0x48b
>> [339894.122435]  [<ffffffff813cd7f3>] blk_queue_bio+0x44/0x289
>> [339894.122439]  [<ffffffff813cc226>] generic_make_request+0xbd/0x160
>> [339894.122443]  [<ffffffff813cc3c9>] submit_bio+0x100/0x11d
>> [339894.122446]  [<ffffffff813d2b8a>] ? next_bio+0x1d/0x40
>> [339894.122450]  [<ffffffff813c4d10>] submit_bio_wait+0x4e/0x62
>> [339894.122454]  [<ffffffff813d2df3>] blkdev_issue_discard+0x71/0xa9
>> [339894.122459]  [<ffffffff81534fd4>] __do_block_io_op+0x4f0/0x579
>> [339894.122463]  [<ffffffff81534fd4>] ? __do_block_io_op+0x4f0/0x579
>> [339894.122469]  [<ffffffff81770005>] ? sha_transform+0xf47/0x1069
>> [339894.122474]  [<ffffffff81535544>] xen_blkif_schedule+0x318/0x63c
>> [339894.122478]  [<ffffffff81777498>] ? __schedule+0x32e/0x4e8
>> [339894.122484]  [<ffffffff81088f9b>] ? wake_up_atomic_t+0x2c/0x2c
>> [339894.122488]  [<ffffffff8153522c>] ? xen_blkif_be_int+0x2c/0x2c
>> [339894.122492]  [<ffffffff810742aa>] kthread+0xa6/0xae
>> [339894.122496]  [<ffffffff81074204>] ? init_completion+0x24/0x24
>> [339894.122501]  [<ffffffff8177a335>] ret_from_fork+0x25/0x30
> 
> Are you using some kind of software RAID or similar backend for the disk
> images? It looks like someone (not blkback) is trying to split a discard bio
> (or maybe even a discard bio with 0 sectors), and that's causing a BUG to
> trigger. TBH, I would expect blkdev_issue_discard to either ignore or reject
> such requests, but it doesn't seem to do so (or at least I cannot find it).
> 
> Could you try the below patch and report back what output do you get?
> 
> Thanks, Roger.
> 
> ---8<---
> diff --git a/drivers/block/xen-blkback/blkback.c 
> b/drivers/block/xen-blkback/blkback.c
> index 726c32e..1964e9c 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -1027,6 +1027,8 @@ static int dispatch_discard_io(struct xen_blkif_ring 
> *ring,
>                (req->u.discard.flag & BLKIF_DISCARD_SECURE)) ?
>                BLKDEV_DISCARD_SECURE : 0;
>  
> +     pr_info("Sending discard, sector %llu nr %llu\n",
> +             req->u.discard.sector_number, req->u.discard.nr_sectors);
>       err = blkdev_issue_discard(bdev, req->u.discard.sector_number,
>                                  req->u.discard.nr_sectors,
>                                  GFP_KERNEL, secure);
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.