[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] hanging domU
Dear xen-users,We periodically have hanging domU's which I would like to debug somehow. The dom0 is: # xl info host : node-2 release : 4.2.0-0.bpo.1-amd64 version : #1 SMP Debian 4.2.6-1~bpo8+1 (2015-11-18) machine : x86_64 nr_cpus : 16 max_cpu_id : 23 nr_nodes : 2 cores_per_socket : 4 threads_per_core : 2 cpu_mhz : 2400hw_caps : bfebfbff:2c100800:00000000:00003f00:029ee3ff:00000000:00000001:00000000 virt_caps : hvm hvm_directio total_memory : 98295 free_memory : 25794 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 4 xen_extra : .1 xen_version : 4.4.1xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : xen_commandline : placeholder dom0_mem=1536M cc_compiler : gcc (Debian 4.9.2-10) 4.9.2 cc_compile_by : carnil cc_compile_domain : debian.org cc_compile_date : Mon Nov 2 16:39:32 UTC 2015 xend_config_format : 4And the domU is a debian jessie with its 3.16 kernel. The image for the domU is hosted on an nfs server. When the domU hangs, it issues warnings for blocked processes, it seems as the disk system is hanging in it: [99360.112539] INFO: task asterisk:531 blocked for more than 120 seconds. [99360.112544] Not tainted 3.16.0-4-amd64 #1[99360.112549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [99360.112554] asterisk D ffff88007b6744e8 0 531 1 0x00000000[99360.112562] ffff88007b674090 0000000000000286 0000000000012f00 ffff88007b50bfd8 [99360.112571] 0000000000012f00 ffff88007b674090 ffff88007f3137b0 ffff88007fc09648 [99360.112580] 0000000000000002 ffffffff8113ca70 ffff88007b50bdd0 ffff88007b50be70 [99360.112589] Call Trace: [99360.112597] [<ffffffff8113ca70>] ? wait_on_page_read+0x60/0x60 [99360.112604] [<ffffffff8150e019>] ? io_schedule+0x99/0x120 [99360.112612] [<ffffffff8113ca7a>] ? sleep_on_page+0xa/0x10 [99360.112618] [<ffffffff8150e39c>] ? __wait_on_bit+0x5c/0x90 [99360.112624] [<ffffffff8113c86f>] ? wait_on_page_bit+0x7f/0x90 [99360.112632] [<ffffffff810a7a70>] ? autoremove_wake_function+0x30/0x30 [99360.112639] [<ffffffff81149ddd>] ? pagevec_lookup_tag+0x1d/0x30 [99360.112644] [<ffffffff8113c950>] ? filemap_fdatawait_range+0xd0/0x160 [99360.112649] [<ffffffff8113e43a>] ? filemap_write_and_wait_range+0x3a/0x60 [99360.112657] [<ffffffffa00585a1>] ? ext4_sync_file+0xb1/0x310 [ext4] [99360.112663] [<ffffffff811d53cb>] ? do_fsync+0x4b/0x70 [99360.112667] [<ffffffff811d563c>] ? SyS_fsync+0xc/0x10 [99360.112671] [<ffffffff8151158d>] ? system_call_fast_compare_end+0x10/0x15and such. When I restart the domain, a haging qemu process is always left behind: # xl list Name ID Mem VCPUs State Time(s) ... (null) 43 1 4 --psrd 14859.1 ... # ps ax|grep 'xen-domid 43' # ps ax|grep 'xen-domid 43' 15572 pts/39 S+ 0:00 grep xen-domid 4323690 ? Ssl 4:41 /usr/bin/qemu-system-i386 -xen-domid 43 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-43,server,nowait -mon chardev=libxl-cmd,mode=control -nodefaults -xen-attach -name domU -vnc none -display none -nographic -machine xenpv -m 2049 I have to kill that process for the domain get removed from xl list. How can I trace what that process is hanging on? I would not say that the domU is a heaviliy loaded vm, it handles some sip clients, but today it hanged in the early morning hours when I expect very low sip activity. When the hang occurs, the nfs share is accessible, and other vms perform read/write operations on it as always. Unfortunately I have no idea how to even reproduce the issue. Any ideas how to proceed forward? Thanks in advance, Kojedzinszky RichÃrd Euronet Magyarorszag Informatika Zrt. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |