Me and another sysadmin has independently been researching a problem where DomU randomly locks (Can’t reach it via xl console, no ping / SSH connection, shown as stuck in running-state in xentop) on two of our separate machines (installed completely independently):
Dom0:
Debian 7.0 with Xen version: 4.1.4 and xen-utils 4.1.4-3+deb7u1
Debian 7.1 with Xen version: 4.1
DomU:
Debian 7.0
Debian 7.1(.3)
Common denominator appears to be qemu-dm consuming (leaking?) memory until the Dom0 swaps. When the Dom0 swap is full, the domU appears to be locked (see above) Dom0, at which time a hard reboot a.ka. xl destroy + xl create is the only way to get it back. This *could* be related to "[Xen-devel] qemu-system-i386: memory leak?"
http://xen.markmail.org/message/chqpifrj46lxdxx2
DomU by themselves doesn’t use any abnormal memory or swap. All DomU are image-file based (disk.img, swap.img)
To give an overview, currently Dom0 uses 26GB of swap with 8 active domU. Swap per process:
Pid Swap Process Uptime
3766 98452 kB qemu-dm -d 29 -domain-name [hostname] -nographic -M xenpv 160 days
6100 276988 kB qemu-dm -d 42 -domain-name [hostname] -nographic -M xenpv 108 days
6790 121620 kB qemu-dm -d 46 -domain-name [hostname] -nographic -M xenpv 95 days
10616 791616 kB qemu-dm -d 51 -domain-name [hostname] -nographic -M xenpv 32 days
11588 3514436 kB qemu-dm -d 49 -domain-name [hostname] -nographic -M xenpv 73 days
16290 170436 kB qemu-dm -d 43 -domain-name [hostname] -nographic -M xenpv 107 days
26974 1647248 kB qemu-dm -d 48 -domain-name [hostname] -nographic -M xenpv 92 days
32403 21147060 kB qemu-dm -d 52 -domain-name [hostname] -nographic -M xenpv 29 days
Generally, the higher usage the higher swap.
Possibly, the higher IO the higher swap.
DomU #32403 is a fairly low-utilized DomU with a 30GB database and log parsing as primary application. It currently increases roughly 2GB per day in swap. Only difference between it and the others is that this has (probably several times) more IO.
Machine #1 (me):
$ dmesg|grep qe
[7548057.392504] qemu-dm[528]: segfault at ff0 ip 00007f1e39229ca0 sp 00007fffb9e36bb8 error 4 in libc-2.13.so[7f1e3910a000+180000]
[11263387.091221] qemu-dm[7474]: segfault at ff0 ip 00007f695e32dca0 sp 00007fff5a3b27a8 error 4 in libc-2.13.so[7f695e20e000+180000]
Machine #2:
$ dmesg|grep qe
[2593763.122800] Out of memory: Kill process 2778 (qemu-dm) score 892 or sacrifice child
[2593763.122824] Killed process 2778 (qemu-dm) total-vm:3629932kB, anon-rss:1363584kB, file-rss:572kB
[3166462.372758] Out of memory: Kill process 30974 (qemu-dm) score 868 or sacrifice child
[3166462.372782] Killed process 30974 (qemu-dm) total-vm:3545568kB, anon-rss:1282888kB, file-rss:548kB