[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null domains after xl destroy

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
From: Glenn Enright <glenn@xxxxxxxxxxxxxxx>
Date: Sat, 13 May 2017 16:02:14 +1200
Cc: Jennifer Herbert <Jennifer.Herbert@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Steven Haigh <netwiz@xxxxxxxxx>, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Delivery-date: Sat, 13 May 2017 04:02:52 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 09/05/17 21:24, Roger Pau Monné wrote:

On Mon, May 08, 2017 at 11:10:24AM +0200, Juergen Gross wrote:

On 04/05/17 00:17, Glenn Enright wrote:

On 04/05/17 04:58, Steven Haigh wrote:

On 04/05/17 01:53, Juergen Gross wrote:

On 03/05/17 12:45, Steven Haigh wrote:

Just wanted to give this a little nudge now people seem to be back on
deck...


Glenn, could you please give the attached patch a try?

It should be applied on top of the other correction, the old debug
patch should not be applied.

I have added some debug output to make sure we see what is happening.


This patch is included in kernel-xen-4.9.26-1

It should be in the repos now.


Still seeing the same issue. Without the extra debug patch all I see in
the logs after destroy is this...

xen-blkback: xen_blkif_disconnect: busy
xen-blkback: xen_blkif_free: delayed = 0


Hmm, to me it seems as if some grant isn't being unmapped.

Looking at gnttab_unmap_refs_async() I wonder how this is supposed to
work:

I don't see how a grant would ever be unmapped in case of
page_count(item->pages[pc]) > 1 in __gnttab_unmap_refs_async(). All it
does is deferring the call to the unmap operation again and again. Or
am I missing something here?


No, I don't think you are missing anything, but I cannot see how this can be
solved in a better way, unmapping a page that's still referenced is certainly
not the best option, or else we risk triggering a page-fault elsewhere.

IMHO, gnttab_unmap_refs_async should have a timeout, and return an error at
some point. Also, I'm wondering whether there's a way to keep track of who has
references on a specific page, but so far I haven't been able to figure out how
to get this information from Linux.

Also, I've noticed that __gnttab_unmap_refs_async uses page_count, shouldn't it
use page_ref_count instead?

Roger.

In case it helps, I have continued to work on this. I notices processedleft behind (under 4.9.27). The same issue is ongoing.


# ps auxf | grep [x]vda

root 2983 0.0 0.0 0 0 ? S 01:44 0:00 \_[1.xvda1-1]root 5457 0.0 0.0 0 0 ? S 02:06 0:00 \_[3.xvda1-1]root 7382 0.0 0.0 0 0 ? S 02:36 0:00 \_[4.xvda1-1]root 9668 0.0 0.0 0 0 ? S 02:51 0:00 \_[6.xvda1-1]root 11080 0.0 0.0 0 0 ? S 02:57 0:00 \_[7.xvda1-1]


# xl list
Name                              ID   Mem VCPUs      State   Time(s)
Domain-0                          0  1512     2     r-----     118.5
(null)                            1     8     4     --p--d      43.8
(null)                            3     8     4     --p--d       6.3
(null)                            4     8     4     --p--d      73.4
(null)                            6     8     4     --p--d      14.7
(null)                            7     8     4     --p--d      30

Those all have...

[root 11080]# cat wchan
xen_blkif_schedule

[root 11080]# cat stack
[<ffffffff814eaee8>] xen_blkif_schedule+0x418/0xb40
[<ffffffff810a0555>] kthread+0xe5/0x100
[<ffffffff816f1c45>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff

So I can give anything that is in the /proc/pid space. 'maps' and'pagemap' are empty fwiw.

I also have perf tools installed on this box in case anyone has anythingI can run with that which might help.

A reminder I can replicate by doing xl destroy $domU while running thisinside the domU. Has anyone else been able to repeat this?


{
while true; do
 dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
done
}

This does not work in 4.4 kernels, so there is possibly a regression ofsome sort?


Regards, Glenn

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] null domains after xl destroy
  - From: Juergen Gross

References:
- Re: [Xen-devel] null domains after xl destroy
  - From: Glenn Enright
- Re: [Xen-devel] null domains after xl destroy
  - From: Steven Haigh
- Re: [Xen-devel] null domains after xl destroy
  - From: Juergen Gross
- Re: [Xen-devel] null domains after xl destroy
  - From: Steven Haigh
- Re: [Xen-devel] null domains after xl destroy
  - From: Glenn Enright
- Re: [Xen-devel] null domains after xl destroy
  - From: Juergen Gross
- Re: [Xen-devel] null domains after xl destroy
  - From: Roger Pau Monné

Prev by Date: Re: [Xen-devel] Proposal to allow setting up shared memory areas between VMs from xl config file
Next by Date: [Xen-devel] [Outreachy] Adding Floating Point support to Mini-OS Project
Previous by thread: Re: [Xen-devel] null domains after xl destroy
Next by thread: Re: [Xen-devel] null domains after xl destroy
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.