[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] hanging domU



Dear xen-users,

We periodically have hanging domU's which I would like to debug somehow. The dom0 is:
# xl info
host                   : node-2
release                : 4.2.0-0.bpo.1-amd64
version                : #1 SMP Debian 4.2.6-1~bpo8+1 (2015-11-18)
machine                : x86_64
nr_cpus                : 16
max_cpu_id             : 23
nr_nodes               : 2
cores_per_socket       : 4
threads_per_core       : 2
cpu_mhz                : 2400
hw_caps : bfebfbff:2c100800:00000000:00003f00:029ee3ff:00000000:00000001:00000000
virt_caps              : hvm hvm_directio
total_memory           : 98295
free_memory            : 25794
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 4
xen_extra              : .1
xen_version            : 4.4.1
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          :
xen_commandline        : placeholder dom0_mem=1536M
cc_compiler            : gcc (Debian 4.9.2-10) 4.9.2
cc_compile_by          : carnil
cc_compile_domain      : debian.org
cc_compile_date        : Mon Nov  2 16:39:32 UTC 2015
xend_config_format     : 4

And the domU is a debian jessie with its 3.16 kernel. The image for the domU is hosted on an nfs server. When the domU hangs, it issues warnings for blocked processes, it seems as the disk system is hanging in it:

[99360.112539] INFO: task asterisk:531 blocked for more than 120 seconds.
[99360.112544]       Not tainted 3.16.0-4-amd64 #1
[99360.112549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[99360.112554] asterisk        D ffff88007b6744e8     0   531      1 0x00000000
[99360.112562] ffff88007b674090 0000000000000286 0000000000012f00 ffff88007b50bfd8 [99360.112571] 0000000000012f00 ffff88007b674090 ffff88007f3137b0 ffff88007fc09648 [99360.112580] 0000000000000002 ffffffff8113ca70 ffff88007b50bdd0 ffff88007b50be70
[99360.112589] Call Trace:
[99360.112597]  [<ffffffff8113ca70>] ? wait_on_page_read+0x60/0x60
[99360.112604]  [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[99360.112612]  [<ffffffff8113ca7a>] ? sleep_on_page+0xa/0x10
[99360.112618]  [<ffffffff8150e39c>] ? __wait_on_bit+0x5c/0x90
[99360.112624]  [<ffffffff8113c86f>] ? wait_on_page_bit+0x7f/0x90
[99360.112632]  [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30
[99360.112639]  [<ffffffff81149ddd>] ? pagevec_lookup_tag+0x1d/0x30
[99360.112644]  [<ffffffff8113c950>] ? filemap_fdatawait_range+0xd0/0x160
[99360.112649]  [<ffffffff8113e43a>] ? filemap_write_and_wait_range+0x3a/0x60
[99360.112657]  [<ffffffffa00585a1>] ? ext4_sync_file+0xb1/0x310 [ext4]
[99360.112663]  [<ffffffff811d53cb>] ? do_fsync+0x4b/0x70
[99360.112667]  [<ffffffff811d563c>] ? SyS_fsync+0xc/0x10
[99360.112671]  [<ffffffff8151158d>] ? system_call_fast_compare_end+0x10/0x15

and such. When I restart the domain, a haging qemu process is always left behind:
# xl list
Name                                        ID   Mem VCPUs      State Time(s)
...
(null)                                      43     1     4     --psrd
14859.1
...

# ps ax|grep 'xen-domid 43'
# ps ax|grep 'xen-domid 43'
15572 pts/39   S+     0:00 grep xen-domid 43
23690 ? Ssl 4:41 /usr/bin/qemu-system-i386 -xen-domid 43 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-43,server,nowait -mon chardev=libxl-cmd,mode=control -nodefaults -xen-attach -name domU -vnc none -display none -nographic -machine xenpv -m 2049

I have to kill that process for the domain get removed from xl list. How can I trace what that process is hanging on?

I would not say that the domU is a heaviliy loaded vm, it handles some sip clients, but today it hanged in the early morning hours when I expect very low sip activity.

When the hang occurs, the nfs share is accessible, and other vms perform read/write operations on it as always.

Unfortunately I have no idea how to even reproduce the issue.

Any ideas how to proceed forward?


Thanks in advance,

Kojedzinszky RichÃrd
Euronet Magyarorszag Informatika Zrt.
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.