Xen project Mailing List

Re: [Xen-devel] [BUG] kernel BUG at drivers/block/xen-blkfront.c:1711

To: Evgenii Shatokhin <eshatokhin@xxxxxxxxxxxxx>

Date: Thu, 14 Jul 2016 20:04:31 +0800

Cc: Juergen Gross <jgross@xxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, David Vrabel <david.vrabel@xxxxxxxxxx>, Konstantin Khorenko <khorenko@xxxxxxxxxxxxx>, Roger Pau Monne <roger.paumonne@xxxxxxxxxx>

Delivery-date: Thu, 14 Jul 2016 12:05:21 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 07/14/2016 07:49 PM, Evgenii Shatokhin wrote: > On 11.07.2016 15:04, Bob Liu wrote: >> >> >> On 07/11/2016 04:50 PM, Evgenii Shatokhin wrote: >>> On 06.06.2016 11:42, Dario Faggioli wrote: >>>> Just Cc-ing some Linux, block, and Xen on CentOS people... >>>> >>> >>> Ping. >>> >>> Any suggestions how to debug this or what might cause the problem? >>> >>> Obviously, we cannot control Xen on the Amazon's servers. But perhaps there >>> is something we can do at the kernel's side, is it? >>> >>>> On Mon, 2016-06-06 at 11:24 +0300, Evgenii Shatokhin wrote: >>>>> (Resending this bug report because the message I sent last week did >>>>> not >>>>> make it to the mailing list somehow.) >>>>> >>>>> Hi, >>>>> >>>>> One of our users gets kernel panics from time to time when he tries >>>>> to >>>>> use his Amazon EC2 instance with CentOS7 x64 in it [1]. Kernel panic >>>>> happens within minutes from the moment the instance starts. The >>>>> problem >>>>> does not show up every time, however. >>>>> >>>>> The user first observed the problem with a custom kernel, but it was >>>>> found later that the stock kernel 3.10.0-327.18.2.el7.x86_64 from >>>>> CentOS7 was affected as well. >> >> Please try this patch: >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7b0767502b5db11cb1f0daef2d01f6d71b1192dc >> >> Regards, >> Bob >> > > Unfortunately, it did not help. The same BUG_ON() in > blkfront_setup_indirect() still triggers in our kernel based on RHEL's > 3.10.0-327.18.2, where I added the patch. > > As far as I can see, the patch makes sure the indirect pages are added to the > list only if (!info->feature_persistent) holds. I suppose it holds in our > case and the pages are added to the list because the triggered BUG_ON() is > here: > > if (!info->feature_persistent && info->max_indirect_segments) { > <...> > BUG_ON(!list_empty(&info->indirect_pages)); > <...> > } > That's odd. Could you please try to reproduce this issue with a recent upstream kernel? Thanks, Bob > So the problem is still out there somewhere, it seems. > > Regards, > Evgenii > >>>>> >>>>> The part of the system log he was able to retrieve is attached. Here >>>>> is >>>>> the bug info, for convenience: >>>>> >>>>> ------------------------------------ >>>>> [ 2.246912] kernel BUG at drivers/block/xen-blkfront.c:1711! >>>>> [ 2.246912] invalid opcode: 0000 [#1] SMP >>>>> [ 2.246912] Modules linked in: ata_generic pata_acpi >>>>> crct10dif_pclmul >>>>> crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel >>>>> xen_netfront xen_blkfront(+) aesni_intel lrw ata_piix gf128mul >>>>> glue_helper ablk_helper cryptd libata serio_raw floppy sunrpc >>>>> dm_mirror >>>>> dm_region_hash dm_log dm_mod scsi_transport_iscsi >>>>> [ 2.246912] CPU: 1 PID: 50 Comm: xenwatch Not tainted >>>>> 3.10.0-327.18.2.el7.x86_64 #1 >>>>> [ 2.246912] Hardware name: Xen HVM domU, BIOS 4.2.amazon >>>>> 12/07/2015 >>>>> [ 2.246912] task: ffff8800e9fcb980 ti: ffff8800e98bc000 task.ti: >>>>> ffff8800e98bc000 >>>>> [ 2.246912] RIP: 0010:[<ffffffffa015584f>] [<ffffffffa015584f>] >>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront] >>>>> [ 2.246912] RSP: 0018:ffff8800e98bfcd0 EFLAGS: 00010283 >>>>> [ 2.246912] RAX: ffff8800353e15c0 RBX: ffff8800e98c52c8 RCX: >>>>> 0000000000000020 >>>>> [ 2.246912] RDX: ffff8800353e15b0 RSI: ffff8800e98c52b8 RDI: >>>>> ffff8800353e15d0 >>>>> [ 2.246912] RBP: ffff8800e98bfd20 R08: ffff8800353e15b0 R09: >>>>> ffff8800eb403c00 >>>>> [ 2.246912] R10: ffffffffa0155532 R11: ffffffffffffffe8 R12: >>>>> ffff8800e98c4000 >>>>> [ 2.246912] R13: ffff8800e98c52b8 R14: 0000000000000020 R15: >>>>> ffff8800353e15c0 >>>>> [ 2.246912] FS: 0000000000000000(0000) GS:ffff8800efc20000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 2.246912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 2.246912] CR2: 00007f1b615ef000 CR3: 00000000e2b44000 CR4: >>>>> 00000000001406e0 >>>>> [ 2.246912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>> 0000000000000000 >>>>> [ 2.246912] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >>>>> 0000000000000400 >>>>> [ 2.246912] Stack: >>>>> [ 2.246912] 0000000000000020 0000000000000001 00000020a0157217 >>>>> 00000100e98bfdbc >>>>> [ 2.246912] 0000000027efa3ef ffff8800e98bfdbc ffff8800e98ce000 >>>>> ffff8800e98c4000 >>>>> [ 2.246912] ffff8800e98ce040 0000000000000001 ffff8800e98bfe08 >>>>> ffffffffa0155d4c >>>>> [ 2.246912] Call Trace: >>>>> [ 2.246912] [<ffffffffa0155d4c>] blkback_changed+0x4ec/0xfc8 >>>>> [xen_blkfront] >>>>> [ 2.246912] [<ffffffff813a6fd0>] ? xenbus_gather+0x170/0x190 >>>>> [ 2.246912] [<ffffffff816322f5>] ? __slab_free+0x10e/0x277 >>>>> [ 2.246912] [<ffffffff813a805d>] >>>>> xenbus_otherend_changed+0xad/0x110 >>>>> [ 2.246912] [<ffffffff813a7257>] ? xenwatch_thread+0x77/0x180 >>>>> [ 2.246912] [<ffffffff813a9ba3>] backend_changed+0x13/0x20 >>>>> [ 2.246912] [<ffffffff813a7246>] xenwatch_thread+0x66/0x180 >>>>> [ 2.246912] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30 >>>>> [ 2.246912] [<ffffffff813a71e0>] ? >>>>> unregister_xenbus_watch+0x1f0/0x1f0 >>>>> [ 2.246912] [<ffffffff810a5aef>] kthread+0xcf/0xe0 >>>>> [ 2.246912] [<ffffffff810a5a20>] ? >>>>> kthread_create_on_node+0x140/0x140 >>>>> [ 2.246912] [<ffffffff81646118>] ret_from_fork+0x58/0x90 >>>>> [ 2.246912] [<ffffffff810a5a20>] ? >>>>> kthread_create_on_node+0x140/0x140 >>>>> [ 2.246912] Code: e1 48 85 c0 75 ce 49 8d 84 24 40 01 00 00 48 89 >>>>> 45 >>>>> b8 e9 91 fd ff ff 4c 89 ff e8 8d ae 06 e1 e9 f2 fc ff ff 31 c0 e9 2e >>>>> fe >>>>> ff ff <0f> 0b e8 9a 57 f2 e0 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 >>>>> 00 >>>>> [ 2.246912] RIP [<ffffffffa015584f>] >>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront] >>>>> [ 2.246912] RSP <ffff8800e98bfcd0> >>>>> [ 2.491574] ---[ end trace 8a9b992812627c71 ]--- >>>>> [ 2.495618] Kernel panic - not syncing: Fatal exception >>>>> ------------------------------------ >>>>> >>>>> Xen version 4.2. >>>>> >>>>> EC2 instance type: c3.large with EBS magnetic storage, if that >>>>> matters. >>>>> >>>>> Here is the code where the BUG_ON triggers (drivers/block/xen- >>>>> blkfront.c): >>>>> ------------------------------------ >>>>> if (!info->feature_persistent && info->max_indirect_segments) { >>>>> /* >>>>> * We are using indirect descriptors but not persistent >>>>> * grants, we need to allocate a set of pages that can be >>>>> * used for mapping indirect grefs >>>>> */ >>>>> int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE; >>>>> >>>>> BUG_ON(!list_empty(&info->indirect_pages)); // << This one hits. >>>>> for (i = 0; i < num; i++) { >>>>> struct page *indirect_page = alloc_page(GFP_NOIO); >>>>> if (!indirect_page) >>>>> goto out_of_memory; >>>>> list_add(&indirect_page->lru, &info->indirect_pages); >>>>> } >>>>> } >>>>> ------------------------------------ >>>>> >>>>> As we checked, 'info->indirect_pages' list indeed contained around >>>>> 30 >>>>> elements at that point. >>>>> >>>>> Any ideas what may cause this and how to fix it? >>>>> >>>>> If any other data are needed, please let me know. >>>>> >>>>> References: >>>>> [1] https://bugs.openvz.org/browse/OVZ-6718 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.