Xen project Mailing List

Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?

From: Elena Ufimtseva <elena.ufimtseva@xxxxxxxxxx>

Date: Tue, 1 Mar 2016 13:20:17 -0500

Cc: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Hyon-Young Choi <commani@xxxxxxxxx>

Delivery-date: Tue, 01 Mar 2016 18:20:29 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote: > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk > <konrad.wilk@xxxxxxxxxx> wrote: > >> > Hey! > >> > > >> > CC-ing Elena. > >> > >> I think you forgot you cc.ed her.. > >> Anyway, let's cc. her now... :-) > >> > >> > > >> >> We are measuring the execution time between native machine environment > >> >> and xen virtualization environment using PARSEC Benchmark [1]. > >> >> > >> >> In virtualiztion environment, we run a domU with three VCPUs, each of > >> >> them pinned to a core; we pin the dom0 to another core that is not > >> >> used by the domU. > >> >> > >> >> Inside the Linux in domU in virtualization environment and in native > >> >> environment, We used the cpuset to isolate a core (or VCPU) for the > >> >> system processors and to isolate a core for the benchmark processes. > >> >> We also configured the Linux boot command line with isocpus= option to > >> >> isolate the core for benchmark from other unnecessary processes. > >> > > >> > You may want to just offline them and also boot the machine with NUMA > >> > disabled. > >> > >> Right, the machine is booted up with NUMA disabled. > >> We will offline the unnecessary cores then. > >> > >> > > >> >> > >> >> We expect that execution time of benchmarks in xen virtualization > >> >> environment is larger than the execution time in native machine > >> >> environment. However, the evaluation gave us an opposite result. > >> >> > >> >> Below is the evaluation data for the canneal and streamcluster > >> >> benchmarks: > >> >> > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial > >> >> Native: 6.387s > >> >> Virtualization: 5.890s > >> >> > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial > >> >> Native: 5.276s > >> >> Virtualization: 5.240s > >> >> > >> >> Is there anything wrong with our evaluation that lead to the abnormal > >> >> performance results? > >> > > >> > Nothing is wrong. Virtualization is naturally faster than baremetal! > >> > > >> > :-) > >> > > >> > No clue sadly. > >> > >> Ah-ha. This is really surprising to me.... Why will it speed up the > >> system by adding one more layer? Unless the virtualization disabled > >> some services that occur in native and interfere with the benchmark. > >> > >> If virtualization is faster than baremetal by nature, why we can see > >> that some experiment shows that virtualization introduces overhead? > > > > Elena told me that there were some weird regression in Linux 4.1 - where > > CPU burning workloads were _slower_ on baremetal than as guests. > > Hi Elena, > Would you mind sharing with us some of your experience of how you > found the real reason? Did you use some tool or some methodology to > pin down the reason (i.e, CPU burning workloads in native is _slower_ > on baremetal than as guests)? > Hi Meng Yes, sure! While working on performance tests for smt-exposing patches from Joao I run CPU bound workload in HVM guest and using same kernel in baremetal run same test. While testing cpu-bound workload on baremetal linux (4.1.0-rc2) I found that the time to complete the same test is few times more that as it takes for the same under HVM guest. I have tried tests where kernel threads pinned to cores and without pinning. The execution times are most of the times take as twice longer, sometimes 4 times longer that HVM case. Interesting is not only that it takes sometimes 3-4 times more than HVM guest, but also that test with bound threads (to cores) takes almost 3 times longer to execute than running same cpu-bound test under HVM (in all configurations). I run each test 5 times and here are the execution times (seconds): ------------------------------------------------- baremetal | thread_bind | thread unbind | HVM pinned to cores ----------- |---------------|--------------------- 74 | 83 | 28 74 | 88 | 28 74 | 38 | 28 74 | 73 | 28 74 | 87 | 28 Sometimes better times were on unbinded tests, but not often enough to present it here. Some results are much worse and reach up to 120 seconds. Each test has 8 kernel threads. In baremetal case I tried the following: - numa off,on; - all cpus are on; - isolate cpus from first node; - set intel_idle.max_cstate=1; - disable intel_pstate; I dont think I have exhausted all the options here, but it looked like two last changes did improve performance, but was still not comparable to HVM case. I am trying to find where regression had happened. Performance on newer kernel (I tried 4.5.0-rc4+) was close or better than HVM. I am trying to find f there were some relevant regressions to understand the reason of this. What kernel you guys use? Elena See more description of the tests here: http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg02874.html Joao patches are here: http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg03115.html) > > > > > > Updating to a later kernel fixed that -where one could see that > > baremetal was faster (or on par) with the guest. > > Thank you very much, Konrad! We are giving it a shot. :-D > > Best Regards, > > Meng _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.