[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] hanging domU

To: xen-users@xxxxxxxxxxxxxxxxxxxx
From: Kojedzinszky Richárd <kojedzinszky.richard@xxxxxxxxxxxx>
Date: Wed, 9 Dec 2015 12:16:07 +0100 (CET)
Delivery-date: Wed, 09 Dec 2015 11:17:16 +0000
List-id: Xen user discussion <xen-users.lists.xen.org>

Dear xen-users,

We periodically have hanging domU's which I would like to debug somehow. Thedom0 is:

# xl info
host                   : node-2
release                : 4.2.0-0.bpo.1-amd64
version                : #1 SMP Debian 4.2.6-1~bpo8+1 (2015-11-18)
machine                : x86_64
nr_cpus                : 16
max_cpu_id             : 23
nr_nodes               : 2
cores_per_socket       : 4
threads_per_core       : 2
cpu_mhz                : 2400

hw_caps :bfebfbff:2c100800:00000000:00003f00:029ee3ff:00000000:00000001:00000000

virt_caps              : hvm hvm_directio
total_memory           : 98295
free_memory            : 25794
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 4
xen_extra              : .1
xen_version            : 4.4.1

xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32hvm-3.0-x86_32p hvm-3.0-x86_64

xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          :
xen_commandline        : placeholder dom0_mem=1536M
cc_compiler            : gcc (Debian 4.9.2-10) 4.9.2
cc_compile_by          : carnil
cc_compile_domain      : debian.org
cc_compile_date        : Mon Nov  2 16:39:32 UTC 2015
xend_config_format     : 4

And the domU is a debian jessie with its 3.16 kernel. The image for the domU ishosted on an nfs server. When the domU hangs, it issues warnings for blockedprocesses, it seems as the disk system is hanging in it:


[99360.112539] INFO: task asterisk:531 blocked for more than 120 seconds.
[99360.112544]       Not tainted 3.16.0-4-amd64 #1

[99360.112549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables thismessage.

[99360.112554] asterisk        D ffff88007b6744e8     0   531      1 0x00000000

[99360.112562] ffff88007b674090 0000000000000286 0000000000012f00ffff88007b50bfd8[99360.112571] 0000000000012f00 ffff88007b674090 ffff88007f3137b0ffff88007fc09648[99360.112580] 0000000000000002 ffffffff8113ca70 ffff88007b50bdd0ffff88007b50be70

[99360.112589] Call Trace:
[99360.112597]  [<ffffffff8113ca70>] ? wait_on_page_read+0x60/0x60
[99360.112604]  [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[99360.112612]  [<ffffffff8113ca7a>] ? sleep_on_page+0xa/0x10
[99360.112618]  [<ffffffff8150e39c>] ? __wait_on_bit+0x5c/0x90
[99360.112624]  [<ffffffff8113c86f>] ? wait_on_page_bit+0x7f/0x90
[99360.112632]  [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
[99360.112639]  [<ffffffff81149ddd>] ? pagevec_lookup_tag+0x1d/0x30
[99360.112644]  [<ffffffff8113c950>] ? filemap_fdatawait_range+0xd0/0x160
[99360.112649]  [<ffffffff8113e43a>] ? filemap_write_and_wait_range+0x3a/0x60
[99360.112657]  [<ffffffffa00585a1>] ? ext4_sync_file+0xb1/0x310 [ext4]
[99360.112663]  [<ffffffff811d53cb>] ? do_fsync+0x4b/0x70
[99360.112667]  [<ffffffff811d563c>] ? SyS_fsync+0xc/0x10
[99360.112671]  [<ffffffff8151158d>] ? system_call_fast_compare_end+0x10/0x15

and such. When I restart the domain, a haging qemu process is always leftbehind:

# xl list
Name                                        ID   Mem VCPUs      State Time(s)
...
(null)                                      43     1     4     --psrd
14859.1
...

# ps ax|grep 'xen-domid 43'
# ps ax|grep 'xen-domid 43'
15572 pts/39   S+     0:00 grep xen-domid 43

23690 ? Ssl 4:41 /usr/bin/qemu-system-i386 -xen-domid 43 -chardevsocket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-43,server,nowait -monchardev=libxl-cmd,mode=control -nodefaults -xen-attach -name domU -vnc none-display none -nographic -machine xenpv -m 2049

I have to kill that process for the domain get removed from xl list. How can Itrace what that process is hanging on?

I would not say that the domU is a heaviliy loaded vm, it handles some sipclients, but today it hanged in the early morning hours when I expect very lowsip activity.

When the hang occurs, the nfs share is accessible, and other vms performread/write operations on it as always.


Unfortunately I have no idea how to even reproduce the issue.

Any ideas how to proceed forward?


Thanks in advance,

Kojedzinszky RichÃrd
Euronet Magyarorszag Informatika Zrt.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

Prev by Date: [Xen-users] Xen 4.1 - networking trouble - Virtual Machine Manager
Next by Date: [Xen-users] hanging domU
Previous by thread: [Xen-users] Xen 4.1 - networking trouble - Virtual Machine Manager
Next by thread: [Xen-users] hanging domU
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.