[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?



On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote:
> On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@xxxxxxxxxx> wrote:
> >> > Hey!
> >> >
> >> > CC-ing Elena.
> >>
> >> I think you forgot you cc.ed her..
> >> Anyway, let's cc. her now... :-)
> >>
> >> >
> >> >> We are measuring the execution time between native machine environment
> >> >> and xen virtualization environment using PARSEC Benchmark [1].
> >> >>
> >> >> In virtualiztion environment, we run a domU with three VCPUs, each of
> >> >> them pinned to a core; we pin the dom0 to another core that is not
> >> >> used by the domU.
> >> >>
> >> >> Inside the Linux in domU in virtualization environment and in native
> >> >> environment,  We used the cpuset to isolate a core (or VCPU) for the
> >> >> system processors and to isolate a core for the benchmark processes.
> >> >> We also configured the Linux boot command line with isocpus= option to
> >> >> isolate the core for benchmark from other unnecessary processes.
> >> >
> >> > You may want to just offline them and also boot the machine with NUMA
> >> > disabled.
> >>
> >> Right, the machine is booted up with NUMA disabled.
> >> We will offline the unnecessary cores then.
> >>
> >> >
> >> >>
> >> >> We expect that execution time of benchmarks in xen virtualization
> >> >> environment is larger than the execution time in native machine
> >> >> environment. However, the evaluation gave us an opposite result.
> >> >>
> >> >> Below is the evaluation data for the canneal and streamcluster 
> >> >> benchmarks:
> >> >>
> >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial
> >> >> Native: 6.387s
> >> >> Virtualization: 5.890s
> >> >>
> >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial
> >> >> Native: 5.276s
> >> >> Virtualization: 5.240s
> >> >>
> >> >> Is there anything wrong with our evaluation that lead to the abnormal
> >> >> performance results?
> >> >
> >> > Nothing is wrong. Virtualization is naturally faster than baremetal!
> >> >
> >> > :-)
> >> >
> >> > No clue sadly.
> >>
> >> Ah-ha. This is really surprising to me.... Why will it speed up the
> >> system by adding one more layer? Unless the virtualization disabled
> >> some services that occur in native and interfere with the benchmark.
> >>
> >> If virtualization is faster than baremetal by nature, why we can see
> >> that some experiment shows that virtualization introduces overhead?
> >
> > Elena told me that there were some weird regression in Linux 4.1 - where
> > CPU burning workloads were _slower_ on baremetal than as guests.
> 
> Hi Elena,
> Would you mind sharing with us some of your experience of how you
> found the real reason? Did you use some tool or some methodology to
> pin down the reason (i.e,  CPU burning workloads in native is _slower_
> on baremetal than as guests)?
>

Hi Meng

Yes, sure!

While working on performance tests for smt-exposing patches from Joao
I run CPU bound workload in HVM guest and using same kernel in baremetal
run same test.
While testing cpu-bound workload on baremetal linux (4.1.0-rc2)
I found that the time to complete the same test is few times more that
as it takes for the same under HVM guest.
I have tried tests where kernel threads pinned to cores and without pinning.
The execution times are most of the times take as twice longer, sometimes 4
times longer that HVM case.

Interesting is not only that it takes sometimes 3-4 times more
than HVM guest, but also that test with bound threads (to cores) takes almost
3 times longer
to execute than running same cpu-bound test under HVM (in all
configurations).

I run each test 5 times and here are the execution times (seconds):

-------------------------------------------------
        baremetal           |
thread_bind | thread unbind | HVM pinned to cores
----------- |---------------|---------------------
     74     |     83        |        28
     74     |     88        |        28
     74     |     38        |        28
     74     |     73        |        28
     74     |     87        |        28

Sometimes better times were on unbinded tests, but not often enough
to present it here. Some results are much worse and reach up to 120
seconds.

Each test has 8 kernel threads. In baremetal case I tried the following:
- numa off,on;
- all cpus are on;
- isolate cpus from first node;
- set intel_idle.max_cstate=1;
- disable intel_pstate;

I dont think I have exhausted all the options here, but it looked like
two last changes did improve performance, but was still not comparable to
HVM case.
I am trying to find where regression had happened. Performance on newer
kernel (I tried 4.5.0-rc4+) was close or better than HVM.

I am trying to find f there were some relevant regressions to understand
the reason of this.


What kernel you guys use?

Elena

See more description of the tests here:
http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg02874.html
Joao patches are here:
http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg03115.html)




> 
> 
> >
> > Updating to a later kernel fixed that  -where one could see that
> > baremetal was faster (or on par) with the guest.
> 
> Thank you very much, Konrad! We are giving it a shot. :-D
> 
> Best Regards,
> 
> Meng

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.