[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

High "steal" on new dom0 with no domu's running



A bit of back story. My Xen experience started out about 5 years ago with an Aaeon EMB-CV1 with 4GB of memory, running four VMs. As I outgrew that hardware, I moved to an Aaeon EMB-KB1 with 8GB of memory. I outgrew that hardware and have moved to an Asrock J5040-ITX with 16GB of memory. These have all worked beautifully and have been very performant, it's amazed me the capabilities of paravirtualization and the efficiency attained from the platform. All have been running with a Debian dom0 and Debian domu's, using the Debian-maintained Xen version. Each dom0 install has been a fresh build, migrating the VMs afterwards.

I recently decided to add an additional board into the mix to help with load so that things requiring more horsepower (like my ELK stack, minecraft server, nextcloud instance, etc.) can live on the 5040, while lower-CPU stuff like DNS, VPN server, mail server, etc. can live on a board with a slower CPU. I wanted to be on the same CPU architecture so that I could live-migrate VMs for maintenance, so I picked up an Asrock J4205 with 8GB of memory. After installing Debian 10 (my standard build currently,) the board was snappy and performant. I then installed xen-tools and xen-system-amd64, and after rebooting, the system took significantly longer to boot, and was very laggy from the console (and SSH as well.) At this point I wasn't running any VMs, didn't have any custom tweaks, etc. Looking at top, there was a lot of "steal," overall averaging around 10% between all four cores (all physical cores.)

I tried tying the dom0 to only one CPU and at that point the dom0 was consistently performant again. However, any domu's I tried to spin up would be very laggy, with high "steal." Live-migrating them back to the 5040 they'd be fine again. This was also the case if I didn't live-migrate but just started them up on the 4205.

I thought maybe I'd goofed something up in the build somehow, so I blew away that installation and rebuilt it from scratch, and experience the same thing.

I started logging performance with sysstat and this is what I see:

On the 5040 with 6 VMs running:

08:25:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
08:35:01 AM     all      0.03      0.00      0.11      0.00      0.05     99.81
08:45:01 AM     all      0.03      0.00      0.11      0.00      0.05     99.81
08:55:01 AM     all      0.03      0.00      0.17      0.00      0.20     99.60
09:05:01 AM     all      0.03      0.00      0.12      0.00      0.05     99.80
09:15:01 AM     all      0.03      0.00      0.13      0.00      0.09     99.75
09:25:01 AM     all      0.03      0.00      0.18      0.00      0.26     99.53
Average:        all      0.03      0.00      0.14      0.00      0.12     99.72

On the 4205 with no VMs running:

08:35:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
08:45:02 AM     all      0.03      0.00      0.07      0.01      7.74     92.15
08:55:02 AM     all      0.03      0.00      0.09      0.00      8.95     90.93
09:05:01 AM     all      0.03      0.00      0.07      0.00      9.19     90.70
09:15:01 AM     all      0.03      0.00      0.08      0.00      7.93     91.96
09:25:01 AM     all      0.03      0.00      0.07      0.00      8.85     91.05
09:35:01 AM     all      0.03      0.00      0.19      0.00      6.73     93.05
Average:        all      0.03      0.00      0.09      0.00      8.24     91.63

# top
top - 09:45:55 up  1:29,  1 user,  load average: 0.15, 0.11, 0.08
Tasks: 161 total,   2 running, 159 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  1.1 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  2.3 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni, 78.9 id,  0.0 wa,  0.0 hi,  0.0 si, 21.1 st
%Cpu2  :  0.0 us,  1.1 sy,  0.0 ni, 89.4 id,  0.0 wa,  0.0 hi,  0.0 si,  9.6 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni, 62.8 id,  0.0 wa,  0.0 hi,  0.0 si, 37.2 st

Is there any way to tell what's causing the performance degradation, and what the dom0 is doing when it's "stealing" the CPU? I've been googling the issue a lot the last few days and haven't found anything useful so far, only threads saying that this happens when you oversubscribe your domu's, but as I'm not running any domu's at this point I don't see how that could be an issue since it's just sitting there looking cool but not doing any real work.

Local disk storage on both dom0's is a single 20GB Intel 313 SLC SSD. VMs are stored on a Debian nas box, connecting via iscsi.

# uname -a
Linux vhost2 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux
# cat /etc/debian_version
10.9

# xl info
host                   :
release                : 4.19.0-14-amd64
version                : #1 SMP Debian 4.19.171-2 (2021-01-30)
machine                : x86_64
nr_cpus                : 4
max_cpu_id             : 3
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 1497.612
hw_caps                : bfebfbff:47f8e3bf:2c100800:00000101:0000000f:2094e283:00000000:00000100
virt_caps              : hvm hvm_directio
total_memory           : 8040
free_memory            : 7413
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 11
xen_extra              : .4
xen_version            : 4.11.4
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          :
xen_commandline        : placeholder dom0_mem=512M,max:512M no-real-mode edd=off
cc_compiler            : gcc (Debian 8.3.0-6) 8.3.0
cc_compile_by          : pkg-xen-devel
cc_compile_domain      : lists.alioth.debian.org
cc_compile_date        : Fri Dec 11 21:33:51 UTC 2020
build_id               : 6d8e0fa3ddb825695eb6c6832631b4fa2331fe41
xend_config_format     : 4


Chris


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.