[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen VMs and Unixbench: single vs multiple cpu behaviour



On 26 November 2015 at 19:09, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
> So, quick and dirty: if you change (by `echo >`-ing) the value of
> flags, you'll see performance boost. I'm quite sure that will be the
> case for UnixBench, and I'm trying to verify whether that is consistent
> with other benchamrks too.
>
> For instance, you can try 4131 or 4147. Remember to do that for all the
> vCPUs:
>
> for f in `seq 0 3`;do echo 4131 > 
> /proc/sys/kernel/sched_domain/cpu$f/domain0/flags ; done;
>
> Basically, what you are doing, is altering the Linux's load balancing
> behavior, in a way that it interacts better with Xen's scheduler.
>
> The various flags are defined here:
> http://lxr.free-electrons.com/source/include/linux/sched.h#L978
>
> I'll follow up with a more detailed explanation, and with more numbers,
> as soon as practical. If, in the meantime, you're up for playing with
> this a bit, feel free. :-D
>
> Thanks and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

I finally found the time for more tests:

Original (flags 4143) benchmark result:

------------------------------------------------------------------------
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables         3355.0
Double-Precision Whetstone                      787.6
Execl Throughput                                     298.8
File Copy 1024 bufsize 2000 maxblocks   3292.7
File Copy 256 bufsize 500 maxblocks      2078.2
File Copy 4096 bufsize 8000 maxblocks   5516.9
Pipe Throughput                                     1855.9
Pipe-based Context Switching                 999.9
Process Creation                               254.4
Shell Scripts (1 concurrent)                    818.0
Shell Scripts (8 concurrent)                  6493.1
System Call Overhead                         2870.2
========
System Benchmarks Index Score       1564.2

------------------------------------------------------------------------
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables    12668.5
Double-Precision Whetstone                 3418.7
Execl Throughput                                 5348.0
File Copy 1024 bufsize 2000 maxblocks 3675.8
File Copy 256 bufsize 500 maxblocks      2328.9
File Copy 4096 bufsize 8000 maxblocks 7945.6
Pipe Throughput                                6977.6
Pipe-based Context Switching          3377.5
Process Creation                            3232.4
Shell Scripts (1 concurrent)              7304.0
Shell Scripts (8 concurrent)                8385.8
System Call Overhead                   7684.0
========
System Benchmarks Index Score       5362.0

*******************************************
Flags 4147 - improvement at both single and multiple concurrent executions:

4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables         3376.4
Double-Precision Whetstone                     783.2
Execl Throughput                                 1553.6
File Copy 1024 bufsize 2000 maxblocks      3298.1
File Copy 256 bufsize 500 maxblocks        2090.5
File Copy 4096 bufsize 8000 maxblocks    5568.7
Pipe Throughput                                   1838.6
Pipe-based Context Switching              1008.4
Process Creation                                  1605.2
Shell Scripts (1 concurrent)                  2829.1
Shell Scripts (8 concurrent)                 5537.9 (exception - lower
score than original)
System Call Overhead                        2861.6
 ========
System Benchmarks Index Score            2292.3 (much better, 46% increase)

------------------------------------------------------------------------
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       12563.8
Double-Precision Whetstone                  3409.7
Execl Throughput                                5318.6
File Copy 1024 bufsize 2000 maxblocks   3736.2
File Copy 256 bufsize 500 maxblocks           2264.9
File Copy 4096 bufsize 8000 maxblocks      7823.5
Pipe Throughput                               6926.8
Pipe-based Context Switching              3343.4
Process Creation                               4982.3
Shell Scripts (1 concurrent)                   9962.4
Shell Scripts (8 concurrent)              9171.9
System Call Overhead                      7621.9
 ========
System Benchmarks Index Score       5714.3 (6% increase)

*******************************************
Flags 4131 - a little better than 4147

4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       3367.9
Double-Precision Whetstone                 783.6
Execl Throughput                                1658.5
File Copy 1024 bufsize 2000 maxblocks      3290.9
File Copy 256 bufsize 500 maxblocks            2091.7
File Copy 4096 bufsize 8000 maxblocks       5621.2
Pipe Throughput                                1856.8
Pipe-based Context Switching                1002.3
Process Creation                                 1520.3
Shell Scripts (1 concurrent)                    2887.5
Shell Scripts (8 concurrent)                     5776.9 (again
exception, lower than original test)
System Call Overhead                       2869.3
========
System Benchmarks Index Score       2308.7 (47% increase)

------------------------------------------------------------------------
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables        12598.0
Double-Precision Whetstone                     3416.2
Execl Throughput                                 5615.1
File Copy 1024 bufsize 2000 maxblocks   3677.8
File Copy 256 bufsize 500 maxblocks      2309.1
File Copy 4096 bufsize 8000 maxblocks   7899.1
Pipe Throughput                                 6915.8
Pipe-based Context Switching                 3301.2
Process Creation                              4982.6
Shell Scripts (1 concurrent)               10097.7
Shell Scripts (8 concurrent)                 9325.7
System Call Overhead                      7664.9
 ========
System Benchmarks Index Score       5759.0 (7% increase)


*******************************************

And now for a surprise - let us look at the test on physical machine:

------------------------------------------------------------------------

4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       3378.3
Double-Precision Whetstone               779.0
Execl Throughput                      1203.0
File Copy 1024 bufsize 2000 maxblocks  3367.0
File Copy 256 bufsize 500 maxblocks        2124.4
File Copy 4096 bufsize 8000 maxblocks      5670.1
Pipe Throughput                             1938.5
Pipe-based Context Switching             576.7
Process Creation                              1471.6
Shell Scripts (1 concurrent)          3282.3
Shell Scripts (8 concurrent)           3082.4
========
System Benchmarks Index Score          2213.6

------------------------------------------------------------------------
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables     12702.2
Double-Precision Whetstone               3393.0
Execl Throughput                               5887.6
File Copy 1024 bufsize 2000 maxblocks   3798.9
File Copy 256 bufsize 500 maxblocks      2338.5
File Copy 4096 bufsize 8000 maxblocks     7897.8
Pipe Throughput                               7274.9
Pipe-based Context Switching              4070.8
Process Creation                               3808.9
Shell Scripts (1 concurrent)               7399.3
Shell Scripts (8 concurrent)                 8307.8
System Call Overhead                        6889.2
========
System Benchmarks Index Score       5548.0


Unixbench in a Xen VM is better than on the physical machine.

With flags set to 4147:
-better by 3.5% for 1 copy of tests
-better by 3% for 4 parallel copies of tests

With flags set to 4131:
-better by 4.3% for 1 copy of tests
-better by 3.8% for 4 parallel copies of tests

I can understand similar results in dhrystone and whetstone because of
direct execution.
But I lack the knowledge to understand why Xen is better at tests with
system calls?

Is it because of VT-x? Can Xen execute system calls "faster" using hardware
support for virtualization than a physical system using "normal" x86
and x86_64 calls?

Can someone provide an explanation?

Regards,

Marko

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.