[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] RE: [RFC, PATCH 0/24] VMI i386 Linux virtualization interface proposal
>-----Original Message----- >From: Zachary Amsden [mailto:zach@xxxxxxxxxx] >Sent: Monday, March 13, 2006 9:58 AM >To: Linus Torvalds; Linux Kernel Mailing List; Virtualization Mailing >List; Xen-devel; Andrew Morton; Zach Amsden; Daniel Hecht; Daniel Arai; >Anne Holler; Pratap Subrahmanyam; Christopher Li; Joshua LeVasseur; >Chris Wright; Rik Van Riel; Jyothy Reddy; Jack Lo; Kip Macy; Jan >Beulich; Ky Srinivasan; Wim Coekaerts; Leendert van Doorn; Zach Amsden >Subject: [RFC, PATCH 0/24] VMI i386 Linux virtualization interface >proposal >In OLS 2005, we described the work that we have been doing in VMware >with respect a common interface for paravirtualization of Linux. We >shared the general vision in Rik's virtualization BoF. >This note is an update on our further work on the Virtual Machine >Interface, VMI. The patches provided have been tested on 2.6.16-rc6. >We are currently recollecting performance information for the new -rc6 >kernel, but expect our numbers to match previous results, which showed >no impact whatsoever on macro benchmarks, and nearly neglible impact >on microbenchmarks. Folks, I'm a member of the performance team at VMware & I recently did a round of testing measuring the performance of a set of benchmarks on the following 2 linux variants, both running natively: 1) 2.6.16-rc6 including VMI + 64MB hole 2) 2.6.16-rc6 not including VMI + no 64MB hole The intent was to measure the overhead of VMI calls on native runs. Data was collected on both p4 & opteron boxes. The workloads used were dbench/1client, netperf/receive+send, UP+SMP kernel compile, lmbench, & some VMware in-house kernel microbenchmarks. The CPU(s) were pegged for all workloads except netperf, for which I include CPU utilization measurements. Attached please find a html file presenting the benchmark results collected in terms of ratio of 1) to 2), along with the raw scores given in brackets. System configurations & benchmark descriptions are given at the end of the webpage; more details are available on request. Also attached for reference is an html file giving the width of the 95% confidence interval around the mean of the scores reported for each benchmark, expressed as a percentage of the mean. As you can see on the benchmark results webpage, the VMI-Native & Native scores for almost all workloads match within the 95% confidence interval. On the P4, only 4 workloads, all lmbench microbenchmarks (forkproc,shproc,mmap,pagefault) were outside the interval & the overheads (2%,1%,2%,1%, respectively) are low. The opteron microbenchmark data was a little more ragged than the P4 in terms of variance, but it appears that only a few lmbench microbenchmarks (forkproc,execproc,shproc) were outside their confidence intervals and they show low overheads (4%,3%,2%, respectively); our in-house segv & divzero seemed to show measureable overheads as well (8%,9%). -Regards, Anne Holler (anne@xxxxxxxxxx) 2.6.16-rc6 Transparent Paravirtualization Performance Scoreboard Updated: 03/20/2006 * Contact: Anne Holler (anne@xxxxxxxxxx) [ITALICS -> the means being compared are within 95% confidence interval width] Throughput benchmarks -> HIGHER IS BETTER -> Higher ratio is better P4 Opteron VMI-Native/Native VMI-Native/Native Comments Dbench 1client 1.00 [312/311] 1.00 [425/425] Netperf Receive 1.00 [948/947] 1.00 [937/937] CpuUtil:P4(VMI:43%,Ntv:42%);Opteron(VMI:36%,Ntv:34%) Send 1.00 [939/939] 1.00 [937/936] CpuUtil:P4(VMI:25%,Ntv:25%);Opteron(VMI:62%,Ntv:60%) Latency benchmarks -> LOWER IS BETTER -> Lower ratio is better P4 Opteron VMI-Native/Native VMI-Native/Native Comments Kernel compile UP 1.00 [221/220] 1.00 [131/131] SMP/2way 1.00 [117/117] 1.00 [67/67] Lmbench process time latencies null call 1.00 [0.17/0.17] 1.00 [0.08/0.08] null i/o 1.00 [0.29/0.29] 0.92 [0.23/0.25] opteron: wide confidence interval stat 0.99 [2.14/2.16] 0.94 [2.25/2.39] opteron: odd, 1% outside wide confidence interval open clos 1.01 [3.00/2.96] 0.98 [3.16/3.24] slct TCP 1.00 [8.84/8.83] 0.94 [11.8/12.5] opteron: wide confidence interval sig inst 0.99 [0.68/0.69] 1.09 [0.36/0.33] opteron: best is 1.03 [0.34/0.33] sig hndl 0.99 [2.19/2.21] 1.05 [1.20/1.14] opteron: best is 1.02 [1.13/1.11] fork proc 1.02 [137/134] 1.04 [100/96] exec proc 1.02 [536/525] 1.03 [309/301] sh proc 1.01 [3204/3169] 1.02 [1551/1528] Lmbench context switch time latencies 2p/0K 1.00 [2.84/2.84] 1.14 [0.74/0.65] opteron: wide confidence interval 2p/16K 1.01 [2.98/2.95] 0.93 [0.74/0.80] opteron: wide confidence interval 2p/64K 1.02 [3.06/3.01] 1.00 [4.19/4.18] 8p/16K 1.02 [3.31/3.26] 0.97 [1.86/1.91] 8p/64K 1.01 [30.4/30.0] 1.00 [4.33/4.34] 16p/16K 0.96 [7.76/8.06] 0.97 [2.03/2.10] 16p/64K 1.00 [41.5/41.4] 1.00 [15.9/15.9] Lmbench system latencies Mmap 1.02 [6681/6542] 1.00 [3452/3441] Prot Fault 1.06 [0.920/0.872] 1.07 [0.197/0.184] p4+opteron: wide confidence interval Page Fault 1.01 [2.065/2.050] 1.00 [1.10/1.10] Kernel Microbenchmarks getppid 1.00 [1.70/1.70] 1.00 [0.83/0.83] segv 0.99 [7.05/7.09] 1.08 [2.95/2.72] forkwaitn 1.02 [3.60/3.54] 1.05 [2.61/2.48] divzero 0.99 [5.68/5.73] 1.09 [2.71/2.48] System Configurations: P4: CPU: 2.4GHz; MEM: 1024MB; DISK: 10K SCSI; Server+Client NICs: Intel e1000 server adapter Opteron: CPU: 2.2Ghz; MEM: 1024MB; DISK: 10K SCSI; Server+Client NICs: Broadcom NetXtreme BCM5704 UP kernel used for all workloads except SMP kernel compile Benchmark Descriptions: Dbench: repeat N times until 95% confidence interval 5% around mean; report mean version 2.0 run as "time ./dbench -c client_plain.txt 1" Netperf: best of 5 runs MessageSize:8192+SocketSize:65536; netperf -H client-ip -l 60 -t TCP_STREAM Kernel compile: best of 3 runs Build of 2.6.11 kernel w/gcc 4.0.2 via "time make -j 16 bzImage" Lmbench: average of best 18 of 30 runs version 3.0-a4; obtained from sourceforge Kernel microbenchmarks: average of best 3 of 5 runs getppid: loop of 10 calls to getppid, repeated 1,000,000 times segv: signal of SIGSEGV, repeated 3,000,000 times forkwaitn: fork/wait for child to exit, repeated 40,000 times divzero: divide by 0 fault 3,000,000 times 2.6.16-rc6 Transparent Paravirtualization Performance Confidence Interval Widths Updated: 03/20/2006 * Contact: Anne Holler (anne@xxxxxxxxxx) Values are 95% confidence interval width around mean given in terms of percentage of mean [BOLD -> confidence interval wider than 5% mean] P4 Opteron Native VMI-Native Native VMI-Native Dbench2.0 1client 5.0% 1.4% 0.8% 3.6% Netperf Receive 0.1% 0.0% 0.0% 0.0% Send 0.6% 1.8% 0.0% 0.0% Kernel compile UP 3.4% 2.6% 2.2% 0.0% SMP/2way 2.4% 4.9% 4.3% 4.2% Lmbench process time latencies null call 0.0% 0.0% 0.0% 0.0% null i/o 0.0% 0.0% 5.2% 10.8% stat 1.0% 1.0% 1.7% 3.2% open clos 1.3% 0.7% 2.4% 3.0% slct TCP 0.3% 0.3% 19.9% 20.1% sig inst 0.3% 0.5% 0.0% 5.5% sig hndl 0.4% 0.4% 2.0% 2.0% fork proc 0.5% 0.9% 0.8% 1.0% exec proc 0.8% 0.9% 1.0% 0.7% sh proc 0.1% 0.2% 0.9% 0.4% Lmbench context switch time latencies 2p/0K 0.8% 1.8% 16.1% 9.9% 2p/16K 1.5% 1.8% 10.5% 10.1% 2p/64K 2.4% 3.0% 1.8% 1.4% 8p/16K 4.5% 4.2% 2.4% 4.2% 8p/64K 3.0% 2.8% 1.6% 1.5% 16p/16K 3.1% 6.7% 2.6% 3.2% 16p/64K 0.5% 0.5% 2.9% 2.9% Lmbench system latencies Mmap 0.7% 0.3% 2.2% 2.4% Prot Fault 7.4% 7.5% 49.4% 38.7% Page Fault 0.2% 0.2% 2.4% 2.9% Kernel Microbenchmarks getppid 1.7% 2.9% 3.5% 3.5% segv 2.3% 0.7% 1.8% 1.9% forkwaitn 0.8% 0.8% 5.3% 2.2% divzero 0.9% 1.3% 1.2% 1.1% _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |