[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null domains after xl destroy



On 18/04/17 20:36, Juergen Gross wrote:
On 12/04/17 00:45, Glenn Enright wrote:
On 12/04/17 10:23, Andrew Cooper wrote:
On 11/04/2017 23:13, Glenn Enright wrote:
On 11/04/17 21:49, Dietmar Hahn wrote:
Am Dienstag, 11. April 2017, 20:03:14 schrieb Glenn Enright:
On 11/04/17 17:59, Juergen Gross wrote:
On 11/04/17 07:25, Glenn Enright wrote:
Hi all

We are seeing an odd issue with domu domains from xl destroy, under
recent 4.9 kernels a (null) domain is left behind.

I guess this is the dom0 kernel version?

This has occurred on a variety of hardware, with no obvious
commonality.

4.4.55 does not show this behavior.

On my test machine I have the following packages installed under
centos6, from https://xen.crc.id.au/

~]# rpm -qa | grep xen
xen47-licenses-4.7.2-4.el6.x86_64
xen47-4.7.2-4.el6.x86_64
kernel-xen-4.9.21-1.el6xen.x86_64
xen47-ocaml-4.7.2-4.el6.x86_64
xen47-libs-4.7.2-4.el6.x86_64
xen47-libcacard-4.7.2-4.el6.x86_64
xen47-hypervisor-4.7.2-4.el6.x86_64
xen47-runtime-4.7.2-4.el6.x86_64
kernel-xen-firmware-4.9.21-1.el6xen.x86_64

I've also replicated the issue with 4.9.17 and 4.9.20

To replicate, on a cleanly booted dom0 with one pv VM, I run the
following on the VM

{
while true; do
 dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
done
}

Then on the dom0 I do this sequence to reliably get a null domain.
This
occurs with oxenstored and xenstored both.

{
xl sync 1
xl destroy 1
}

xl list then renders something like ...

(null)                                       1     4     4
--p--d
9.8     0

Something is referencing the domain, e.g. some of its memory pages
are
still mapped by dom0.

You can try
# xl debug-keys q
and further
# xl dmesg
to see the output of the previous command. The 'q' dumps domain
(and guest debug) info.
# xl debug-keys h
prints all possible parameters for more informations.

Dietmar.


I've done this as requested, below is the output.

<snip>
(XEN) Memory pages belonging to domain 1:
(XEN)     DomPage 0000000000071c00: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c01: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c02: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c03: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c04: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c05: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c06: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c07: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c08: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c09: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c0a: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c0b: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c0c: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c0d: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c0e: caf=00000001, taf=7400000000000001
(XEN)     DomPage 0000000000071c0f: caf=00000001, taf=7400000000000001

There are 16 pages still referenced from somewhere.

Just a wild guess: could you please try the attached kernel patch? This
might give us some more diagnostic data...


Juergen


Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which still shows the issue. When replicating the leak I now see this trace (via dmesg). Hopefully that is useful.

Please note, I'm going to be offline next week, but am keen to keep on with this, it may just be a while before I followup is all.

Regards, Glenn
http://rimuhosting.com


------------[ cut here ]------------
WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508 xen_blkbk_remove+0x138/0x140 Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801 i2c_smbus i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1
Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007
 ffffc90040cfbba8 ffffffff8136b61f 0000000000000013 0000000000000000
 0000000000000000 0000000000000000 ffffc90040cfbbf8 ffffffff8108007d
 ffffea0001373fe0 000001fc33394434 ffff880000000001 ffff88004d93fac0
Call Trace:
 [<ffffffff8136b61f>] dump_stack+0x67/0x98
 [<ffffffff8108007d>] __warn+0xfd/0x120
 [<ffffffff810800bd>] warn_slowpath_null+0x1d/0x20
 [<ffffffff814ebde8>] xen_blkbk_remove+0x138/0x140
 [<ffffffff814497f7>] xenbus_dev_remove+0x47/0xa0
 [<ffffffff814bcfd4>] __device_release_driver+0xb4/0x160
 [<ffffffff814bd0ad>] device_release_driver+0x2d/0x40
 [<ffffffff814bbfd4>] bus_remove_device+0x124/0x190
 [<ffffffff814b93a2>] device_del+0x112/0x210
 [<ffffffff81448113>] ? xenbus_read+0x53/0x70
 [<ffffffff814b94c2>] device_unregister+0x22/0x60
 [<ffffffff814ed7cd>] frontend_changed+0xad/0x4c0
 [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
 [<ffffffff81449b57>] xenbus_otherend_changed+0xc7/0x140
 [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
 [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
 [<ffffffff81449fe0>] frontend_changed+0x10/0x20
 [<ffffffff814477fc>] xenwatch_thread+0x9c/0x140
 [<ffffffff810bffa0>] ? woken_wake_function+0x20/0x20
 [<ffffffff816ed93a>] ? schedule+0x3a/0xa0
 [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
 [<ffffffff810c0c5d>] ? complete+0x4d/0x60
 [<ffffffff81447760>] ? split+0xf0/0xf0
 [<ffffffff810a051d>] kthread+0xcd/0xf0
 [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
 [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
 [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
 [<ffffffff816f1b45>] ret_from_fork+0x25/0x30
---[ end trace ee097287c9865a62 ]---

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.