[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Xen VMs and Unixbench: single vs multiple cpu behaviour
On 26 November 2015 at 19:09, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > So, quick and dirty: if you change (by `echo >`-ing) the value of > flags, you'll see performance boost. I'm quite sure that will be the > case for UnixBench, and I'm trying to verify whether that is consistent > with other benchamrks too. > > For instance, you can try 4131 or 4147. Remember to do that for all the > vCPUs: > > for f in `seq 0 3`;do echo 4131 > > /proc/sys/kernel/sched_domain/cpu$f/domain0/flags ; done; > > Basically, what you are doing, is altering the Linux's load balancing > behavior, in a way that it interacts better with Xen's scheduler. > > The various flags are defined here: > http://lxr.free-electrons.com/source/include/linux/sched.h#L978 > > I'll follow up with a more detailed explanation, and with more numbers, > as soon as practical. If, in the meantime, you're up for playing with > this a bit, feel free. :-D > > Thanks and Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > I finally found the time for more tests: Original (flags 4143) benchmark result: ------------------------------------------------------------------------ 4 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 3355.0 Double-Precision Whetstone 787.6 Execl Throughput 298.8 File Copy 1024 bufsize 2000 maxblocks 3292.7 File Copy 256 bufsize 500 maxblocks 2078.2 File Copy 4096 bufsize 8000 maxblocks 5516.9 Pipe Throughput 1855.9 Pipe-based Context Switching 999.9 Process Creation 254.4 Shell Scripts (1 concurrent) 818.0 Shell Scripts (8 concurrent) 6493.1 System Call Overhead 2870.2 ======== System Benchmarks Index Score 1564.2 ------------------------------------------------------------------------ 4 CPUs in system; running 4 parallel copies of tests Dhrystone 2 using register variables 12668.5 Double-Precision Whetstone 3418.7 Execl Throughput 5348.0 File Copy 1024 bufsize 2000 maxblocks 3675.8 File Copy 256 bufsize 500 maxblocks 2328.9 File Copy 4096 bufsize 8000 maxblocks 7945.6 Pipe Throughput 6977.6 Pipe-based Context Switching 3377.5 Process Creation 3232.4 Shell Scripts (1 concurrent) 7304.0 Shell Scripts (8 concurrent) 8385.8 System Call Overhead 7684.0 ======== System Benchmarks Index Score 5362.0 ******************************************* Flags 4147 - improvement at both single and multiple concurrent executions: 4 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 3376.4 Double-Precision Whetstone 783.2 Execl Throughput 1553.6 File Copy 1024 bufsize 2000 maxblocks 3298.1 File Copy 256 bufsize 500 maxblocks 2090.5 File Copy 4096 bufsize 8000 maxblocks 5568.7 Pipe Throughput 1838.6 Pipe-based Context Switching 1008.4 Process Creation 1605.2 Shell Scripts (1 concurrent) 2829.1 Shell Scripts (8 concurrent) 5537.9 (exception - lower score than original) System Call Overhead 2861.6 ======== System Benchmarks Index Score 2292.3 (much better, 46% increase) ------------------------------------------------------------------------ 4 CPUs in system; running 4 parallel copies of tests Dhrystone 2 using register variables 12563.8 Double-Precision Whetstone 3409.7 Execl Throughput 5318.6 File Copy 1024 bufsize 2000 maxblocks 3736.2 File Copy 256 bufsize 500 maxblocks 2264.9 File Copy 4096 bufsize 8000 maxblocks 7823.5 Pipe Throughput 6926.8 Pipe-based Context Switching 3343.4 Process Creation 4982.3 Shell Scripts (1 concurrent) 9962.4 Shell Scripts (8 concurrent) 9171.9 System Call Overhead 7621.9 ======== System Benchmarks Index Score 5714.3 (6% increase) ******************************************* Flags 4131 - a little better than 4147 4 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 3367.9 Double-Precision Whetstone 783.6 Execl Throughput 1658.5 File Copy 1024 bufsize 2000 maxblocks 3290.9 File Copy 256 bufsize 500 maxblocks 2091.7 File Copy 4096 bufsize 8000 maxblocks 5621.2 Pipe Throughput 1856.8 Pipe-based Context Switching 1002.3 Process Creation 1520.3 Shell Scripts (1 concurrent) 2887.5 Shell Scripts (8 concurrent) 5776.9 (again exception, lower than original test) System Call Overhead 2869.3 ======== System Benchmarks Index Score 2308.7 (47% increase) ------------------------------------------------------------------------ 4 CPUs in system; running 4 parallel copies of tests Dhrystone 2 using register variables 12598.0 Double-Precision Whetstone 3416.2 Execl Throughput 5615.1 File Copy 1024 bufsize 2000 maxblocks 3677.8 File Copy 256 bufsize 500 maxblocks 2309.1 File Copy 4096 bufsize 8000 maxblocks 7899.1 Pipe Throughput 6915.8 Pipe-based Context Switching 3301.2 Process Creation 4982.6 Shell Scripts (1 concurrent) 10097.7 Shell Scripts (8 concurrent) 9325.7 System Call Overhead 7664.9 ======== System Benchmarks Index Score 5759.0 (7% increase) ******************************************* And now for a surprise - let us look at the test on physical machine: ------------------------------------------------------------------------ 4 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 3378.3 Double-Precision Whetstone 779.0 Execl Throughput 1203.0 File Copy 1024 bufsize 2000 maxblocks 3367.0 File Copy 256 bufsize 500 maxblocks 2124.4 File Copy 4096 bufsize 8000 maxblocks 5670.1 Pipe Throughput 1938.5 Pipe-based Context Switching 576.7 Process Creation 1471.6 Shell Scripts (1 concurrent) 3282.3 Shell Scripts (8 concurrent) 3082.4 ======== System Benchmarks Index Score 2213.6 ------------------------------------------------------------------------ 4 CPUs in system; running 4 parallel copies of tests Dhrystone 2 using register variables 12702.2 Double-Precision Whetstone 3393.0 Execl Throughput 5887.6 File Copy 1024 bufsize 2000 maxblocks 3798.9 File Copy 256 bufsize 500 maxblocks 2338.5 File Copy 4096 bufsize 8000 maxblocks 7897.8 Pipe Throughput 7274.9 Pipe-based Context Switching 4070.8 Process Creation 3808.9 Shell Scripts (1 concurrent) 7399.3 Shell Scripts (8 concurrent) 8307.8 System Call Overhead 6889.2 ======== System Benchmarks Index Score 5548.0 Unixbench in a Xen VM is better than on the physical machine. With flags set to 4147: -better by 3.5% for 1 copy of tests -better by 3% for 4 parallel copies of tests With flags set to 4131: -better by 4.3% for 1 copy of tests -better by 3.8% for 4 parallel copies of tests I can understand similar results in dhrystone and whetstone because of direct execution. But I lack the knowledge to understand why Xen is better at tests with system calls? Is it because of VT-x? Can Xen execute system calls "faster" using hardware support for virtualization than a physical system using "normal" x86 and x86_64 calls? Can someone provide an explanation? Regards, Marko _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |