Xen project Mailing List

Re: [Xen-devel] null domains after xl destroy

To: Juergen Gross <jgross@xxxxxxxx>, glenn@xxxxxxxxxxxxxxx

Date: Thu, 20 Apr 2017 02:22:41 +1000

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Wed, 19 Apr 2017 16:23:16 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 19/04/17 20:09, Juergen Gross wrote: > On 19/04/17 09:16, Roger Pau Monné wrote: >> On Wed, Apr 19, 2017 at 06:39:41AM +0200, Juergen Gross wrote: >>> On 19/04/17 03:02, Glenn Enright wrote: >>>> Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which still >>>> shows the issue. When replicating the leak I now see this trace (via >>>> dmesg). Hopefully that is useful. >>>> >>>> Please note, I'm going to be offline next week, but am keen to keep on >>>> with this, it may just be a while before I followup is all. >>>> >>>> Regards, Glenn >>>> http://rimuhosting.com >>>> >>>> >>>> ------------[ cut here ]------------ >>>> WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508 >>>> xen_blkbk_remove+0x138/0x140 >>>> Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev >>>> xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4 >>>> ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security >>>> iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 >>>> nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc >>>> ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801 i2c_smbus >>>> i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci floppy >>>> dm_mirror dm_region_hash dm_log dm_mod >>>> CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1 >>>> Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007 >>>> ffffc90040cfbba8 ffffffff8136b61f 0000000000000013 0000000000000000 >>>> 0000000000000000 0000000000000000 ffffc90040cfbbf8 ffffffff8108007d >>>> ffffea0001373fe0 000001fc33394434 ffff880000000001 ffff88004d93fac0 >>>> Call Trace: >>>> [<ffffffff8136b61f>] dump_stack+0x67/0x98 >>>> [<ffffffff8108007d>] __warn+0xfd/0x120 >>>> [<ffffffff810800bd>] warn_slowpath_null+0x1d/0x20 >>>> [<ffffffff814ebde8>] xen_blkbk_remove+0x138/0x140 >>>> [<ffffffff814497f7>] xenbus_dev_remove+0x47/0xa0 >>>> [<ffffffff814bcfd4>] __device_release_driver+0xb4/0x160 >>>> [<ffffffff814bd0ad>] device_release_driver+0x2d/0x40 >>>> [<ffffffff814bbfd4>] bus_remove_device+0x124/0x190 >>>> [<ffffffff814b93a2>] device_del+0x112/0x210 >>>> [<ffffffff81448113>] ? xenbus_read+0x53/0x70 >>>> [<ffffffff814b94c2>] device_unregister+0x22/0x60 >>>> [<ffffffff814ed7cd>] frontend_changed+0xad/0x4c0 >>>> [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0 >>>> [<ffffffff81449b57>] xenbus_otherend_changed+0xc7/0x140 >>>> [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20 >>>> [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0 >>>> [<ffffffff81449fe0>] frontend_changed+0x10/0x20 >>>> [<ffffffff814477fc>] xenwatch_thread+0x9c/0x140 >>>> [<ffffffff810bffa0>] ? woken_wake_function+0x20/0x20 >>>> [<ffffffff816ed93a>] ? schedule+0x3a/0xa0 >>>> [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20 >>>> [<ffffffff810c0c5d>] ? complete+0x4d/0x60 >>>> [<ffffffff81447760>] ? split+0xf0/0xf0 >>>> [<ffffffff810a051d>] kthread+0xcd/0xf0 >>>> [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0 >>>> [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40 >>>> [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40 >>>> [<ffffffff816f1b45>] ret_from_fork+0x25/0x30 >>>> ---[ end trace ee097287c9865a62 ]--- >>> >>> Konrad, Roger, >>> >>> this was triggered by a debug patch in xen_blkbk_remove(): >>> >>> if (be->blkif) >>> - xen_blkif_disconnect(be->blkif); >>> + WARN_ON(xen_blkif_disconnect(be->blkif)); >>> >>> So I guess we need something like xen_blk_drain_io() in case of calls to >>> xen_blkif_disconnect() which are not allowed to fail (either at the call >>> sites of xen_blkif_disconnect() or in this function depending on a new >>> boolean parameter indicating it should wait for outstanding I/Os). >>> >>> I can try a patch, but I'd appreciate if you could confirm this wouldn't >>> add further problems... >> >> Hello, >> >> Thanks for debugging this, the easiest solution seems to be to replace the >> ring->inflight atomic_read check in xen_blkif_disconnect with a call to >> xen_blk_drain_io instead, and making xen_blkif_disconnect return void (to >> prevent further issues like this one). > > Glenn, > > can you please try the attached patch (in dom0)? For what its worth, I have applied this in kernel package 4.9.23-2 as follows: * Wed Apr 19 2017 Steven Haigh <netwiz@xxxxxxxxx> - 4.9.23-2 - xen/blkback: fix disconnect while I/Os in flight Its available from any 'in sync' mirror: https://xen.crc.id.au/downloads/ Feedback welcome for both mine and Juergen's sake. -- Steven Haigh Email: netwiz@xxxxxxxxx Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.