[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?
On Tue, Mar 1, 2016 at 4:51 PM, Sander Eikelenboom <linux@xxxxxxxxxxxxxx> wrote: > > Tuesday, March 1, 2016, 9:39:25 PM, you wrote: > >> On Tue, Mar 01, 2016 at 02:52:14PM -0500, Meng Xu wrote: >>> Hi Elena, >>> >>> Thank you very much for sharing this! :-) >>> >>> On Tue, Mar 1, 2016 at 1:20 PM, Elena Ufimtseva >>> <elena.ufimtseva@xxxxxxxxxx> wrote: >>> > >>> > On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote: >>> > > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk >>> > > <konrad.wilk@xxxxxxxxxx> wrote: >>> > > >> > Hey! >>> > > >> > >>> > > >> > CC-ing Elena. >>> > > >> >>> > > >> I think you forgot you cc.ed her.. >>> > > >> Anyway, let's cc. her now... :-) >>> > > >> >>> > > >> > >>> > > >> >> We are measuring the execution time between native machine >>> > > >> >> environment >>> > > >> >> and xen virtualization environment using PARSEC Benchmark [1]. >>> > > >> >> >>> > > >> >> In virtualiztion environment, we run a domU with three VCPUs, >>> > > >> >> each of >>> > > >> >> them pinned to a core; we pin the dom0 to another core that is not >>> > > >> >> used by the domU. >>> > > >> >> >>> > > >> >> Inside the Linux in domU in virtualization environment and in >>> > > >> >> native >>> > > >> >> environment, We used the cpuset to isolate a core (or VCPU) for >>> > > >> >> the >>> > > >> >> system processors and to isolate a core for the benchmark >>> > > >> >> processes. >>> > > >> >> We also configured the Linux boot command line with isocpus= >>> > > >> >> option to >>> > > >> >> isolate the core for benchmark from other unnecessary processes. >>> > > >> > >>> > > >> > You may want to just offline them and also boot the machine with >>> > > >> > NUMA >>> > > >> > disabled. >>> > > >> >>> > > >> Right, the machine is booted up with NUMA disabled. >>> > > >> We will offline the unnecessary cores then. >>> > > >> >>> > > >> > >>> > > >> >> >>> > > >> >> We expect that execution time of benchmarks in xen virtualization >>> > > >> >> environment is larger than the execution time in native machine >>> > > >> >> environment. However, the evaluation gave us an opposite result. >>> > > >> >> >>> > > >> >> Below is the evaluation data for the canneal and streamcluster >>> > > >> >> benchmarks: >>> > > >> >> >>> > > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial >>> > > >> >> Native: 6.387s >>> > > >> >> Virtualization: 5.890s >>> > > >> >> >>> > > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial >>> > > >> >> Native: 5.276s >>> > > >> >> Virtualization: 5.240s >>> > > >> >> >>> > > >> >> Is there anything wrong with our evaluation that lead to the >>> > > >> >> abnormal >>> > > >> >> performance results? >>> > > >> > >>> > > >> > Nothing is wrong. Virtualization is naturally faster than >>> > > >> > baremetal! >>> > > >> > >>> > > >> > :-) >>> > > >> > >>> > > >> > No clue sadly. >>> > > >> >>> > > >> Ah-ha. This is really surprising to me.... Why will it speed up the >>> > > >> system by adding one more layer? Unless the virtualization disabled >>> > > >> some services that occur in native and interfere with the benchmark. >>> > > >> >>> > > >> If virtualization is faster than baremetal by nature, why we can see >>> > > >> that some experiment shows that virtualization introduces overhead? >>> > > > >>> > > > Elena told me that there were some weird regression in Linux 4.1 - >>> > > > where >>> > > > CPU burning workloads were _slower_ on baremetal than as guests. >>> > > >>> > > Hi Elena, >>> > > Would you mind sharing with us some of your experience of how you >>> > > found the real reason? Did you use some tool or some methodology to >>> > > pin down the reason (i.e, CPU burning workloads in native is _slower_ >>> > > on baremetal than as guests)? >>> > > >>> > >>> > Hi Meng >>> > >>> > Yes, sure! >>> > >>> > While working on performance tests for smt-exposing patches from Joao >>> > I run CPU bound workload in HVM guest and using same kernel in baremetal >>> > run same test. >>> > While testing cpu-bound workload on baremetal linux (4.1.0-rc2) >>> > I found that the time to complete the same test is few times more that >>> > as it takes for the same under HVM guest. >>> > I have tried tests where kernel threads pinned to cores and without >>> > pinning. >>> > The execution times are most of the times take as twice longer, sometimes >>> > 4 >>> > times longer that HVM case. >>> > >>> > Interesting is not only that it takes sometimes 3-4 times more >>> > than HVM guest, but also that test with bound threads (to cores) takes >>> > almost >>> > 3 times longer >>> > to execute than running same cpu-bound test under HVM (in all >>> > configurations). >>> >>> >>> wow~ I didn't expect the native performance can be so "bad".... ;-) > >> Yes, quite a surprise :) >>> >>> > >>> > >>> > I run each test 5 times and here are the execution times (seconds): >>> > >>> > ------------------------------------------------- >>> > baremetal | >>> > thread_bind | thread unbind | HVM pinned to cores >>> > ----------- |---------------|--------------------- >>> > 74 | 83 | 28 >>> > 74 | 88 | 28 >>> > 74 | 38 | 28 >>> > 74 | 73 | 28 >>> > 74 | 87 | 28 >>> > >>> > Sometimes better times were on unbinded tests, but not often enough >>> > to present it here. Some results are much worse and reach up to 120 >>> > seconds. >>> > >>> > Each test has 8 kernel threads. In baremetal case I tried the following: >>> > - numa off,on; >>> > - all cpus are on; >>> > - isolate cpus from first node; >>> > - set intel_idle.max_cstate=1; >>> > - disable intel_pstate; >>> > >>> > I dont think I have exhausted all the options here, but it looked like >>> > two last changes did improve performance, but was still not comparable to >>> > HVM case. >>> > I am trying to find where regression had happened. Performance on newer >>> > kernel (I tried 4.5.0-rc4+) was close or better than HVM. > > Just a perhaps silly thought .. but could there be something in the > time-measuring that could differ and explain the slightly surprising results ? Thanks Sander! Actually, I also thought about this reason as Elena did. If it's the time-measuring, the difference about the execution time should not vary for different types of workload/programs. That's why I think the time measurement is not the reason here (at least not the main reason). :-) Best, Meng _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |