[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?
Hi Elena, Thank you very much for sharing this! :-) On Tue, Mar 1, 2016 at 1:20 PM, Elena Ufimtseva <elena.ufimtseva@xxxxxxxxxx> wrote: > > On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote: > > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk > > <konrad.wilk@xxxxxxxxxx> wrote: > > >> > Hey! > > >> > > > >> > CC-ing Elena. > > >> > > >> I think you forgot you cc.ed her.. > > >> Anyway, let's cc. her now... :-) > > >> > > >> > > > >> >> We are measuring the execution time between native machine environment > > >> >> and xen virtualization environment using PARSEC Benchmark [1]. > > >> >> > > >> >> In virtualiztion environment, we run a domU with three VCPUs, each of > > >> >> them pinned to a core; we pin the dom0 to another core that is not > > >> >> used by the domU. > > >> >> > > >> >> Inside the Linux in domU in virtualization environment and in native > > >> >> environment, We used the cpuset to isolate a core (or VCPU) for the > > >> >> system processors and to isolate a core for the benchmark processes. > > >> >> We also configured the Linux boot command line with isocpus= option to > > >> >> isolate the core for benchmark from other unnecessary processes. > > >> > > > >> > You may want to just offline them and also boot the machine with NUMA > > >> > disabled. > > >> > > >> Right, the machine is booted up with NUMA disabled. > > >> We will offline the unnecessary cores then. > > >> > > >> > > > >> >> > > >> >> We expect that execution time of benchmarks in xen virtualization > > >> >> environment is larger than the execution time in native machine > > >> >> environment. However, the evaluation gave us an opposite result. > > >> >> > > >> >> Below is the evaluation data for the canneal and streamcluster > > >> >> benchmarks: > > >> >> > > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial > > >> >> Native: 6.387s > > >> >> Virtualization: 5.890s > > >> >> > > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial > > >> >> Native: 5.276s > > >> >> Virtualization: 5.240s > > >> >> > > >> >> Is there anything wrong with our evaluation that lead to the abnormal > > >> >> performance results? > > >> > > > >> > Nothing is wrong. Virtualization is naturally faster than baremetal! > > >> > > > >> > :-) > > >> > > > >> > No clue sadly. > > >> > > >> Ah-ha. This is really surprising to me.... Why will it speed up the > > >> system by adding one more layer? Unless the virtualization disabled > > >> some services that occur in native and interfere with the benchmark. > > >> > > >> If virtualization is faster than baremetal by nature, why we can see > > >> that some experiment shows that virtualization introduces overhead? > > > > > > Elena told me that there were some weird regression in Linux 4.1 - where > > > CPU burning workloads were _slower_ on baremetal than as guests. > > > > Hi Elena, > > Would you mind sharing with us some of your experience of how you > > found the real reason? Did you use some tool or some methodology to > > pin down the reason (i.e, CPU burning workloads in native is _slower_ > > on baremetal than as guests)? > > > > Hi Meng > > Yes, sure! > > While working on performance tests for smt-exposing patches from Joao > I run CPU bound workload in HVM guest and using same kernel in baremetal > run same test. > While testing cpu-bound workload on baremetal linux (4.1.0-rc2) > I found that the time to complete the same test is few times more that > as it takes for the same under HVM guest. > I have tried tests where kernel threads pinned to cores and without pinning. > The execution times are most of the times take as twice longer, sometimes 4 > times longer that HVM case. > > Interesting is not only that it takes sometimes 3-4 times more > than HVM guest, but also that test with bound threads (to cores) takes almost > 3 times longer > to execute than running same cpu-bound test under HVM (in all > configurations). wow~ I didn't expect the native performance can be so "bad".... ;-) > > > I run each test 5 times and here are the execution times (seconds): > > ------------------------------------------------- > baremetal | > thread_bind | thread unbind | HVM pinned to cores > ----------- |---------------|--------------------- > 74 | 83 | 28 > 74 | 88 | 28 > 74 | 38 | 28 > 74 | 73 | 28 > 74 | 87 | 28 > > Sometimes better times were on unbinded tests, but not often enough > to present it here. Some results are much worse and reach up to 120 > seconds. > > Each test has 8 kernel threads. In baremetal case I tried the following: > - numa off,on; > - all cpus are on; > - isolate cpus from first node; > - set intel_idle.max_cstate=1; > - disable intel_pstate; > > I dont think I have exhausted all the options here, but it looked like > two last changes did improve performance, but was still not comparable to > HVM case. > I am trying to find where regression had happened. Performance on newer > kernel (I tried 4.5.0-rc4+) was close or better than HVM. > > I am trying to find f there were some relevant regressions to understand > the reason of this. I see. If this is only happening for the SMT, it may be caused by the SMT-related load balancing in Linux scheduler. However, I have disabled the HT on my machine. Probably, that's also the reason why I didn't see so much different in performance. > > > > What kernel you guys use? I'm using a quite old kernel 3.10.31 . The reason why I'm using this kernel is because I want to use the LITMUS^RT [1], which is a linux testbed for real-time scheduling research. (It has a new version though, and I can upgrade to the latest version to see if the "problem" still occurs.) Thanks and Best Regards, Meng _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |