[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: [RFC, PATCH 0/24] VMI i386 Linux virtualization interface proposal



>-----Original Message-----
>From: Zachary Amsden [mailto:zach@xxxxxxxxxx]
>Sent: Monday, March 13, 2006 9:58 AM
>To: Linus Torvalds; Linux Kernel Mailing List; Virtualization Mailing
>List; Xen-devel; Andrew Morton; Zach Amsden; Daniel Hecht; Daniel Arai;
>Anne Holler; Pratap Subrahmanyam; Christopher Li; Joshua LeVasseur;
>Chris Wright; Rik Van Riel; Jyothy Reddy; Jack Lo; Kip Macy; Jan
>Beulich; Ky Srinivasan; Wim Coekaerts; Leendert van Doorn; Zach Amsden
>Subject: [RFC, PATCH 0/24] VMI i386 Linux virtualization interface
>proposal

>In OLS 2005, we described the work that we have been doing in VMware
>with respect a common interface for paravirtualization of Linux. We
>shared the general vision in Rik's virtualization BoF.

>This note is an update on our further work on the Virtual Machine
>Interface, VMI.  The patches provided have been tested on 2.6.16-rc6.
>We are currently recollecting performance information for the new -rc6
>kernel, but expect our numbers to match previous results, which showed
>no impact whatsoever on macro benchmarks, and nearly neglible impact
>on microbenchmarks.

Folks,

I'm a member of the performance team at VMware & I recently did a
round of testing measuring the performance of a set of benchmarks
on the following 2 linux variants, both running natively:
 1) 2.6.16-rc6 including VMI + 64MB hole
 2) 2.6.16-rc6 not including VMI + no 64MB hole
The intent was to measure the overhead of VMI calls on native runs.
Data was collected on both p4 & opteron boxes.  The workloads used
were dbench/1client, netperf/receive+send, UP+SMP kernel compile,
lmbench, & some VMware in-house kernel microbenchmarks.  The CPU(s)
were pegged for all workloads except netperf, for which I include
CPU utilization measurements.

Attached please find a html file presenting the benchmark results
collected in terms of ratio of 1) to 2), along with the raw scores
given in brackets.  System configurations & benchmark descriptions
are given at the end of the webpage; more details are available on
request.  Also attached for reference is an html file giving the
width of the 95% confidence interval around the mean of the scores
reported for each benchmark, expressed as a percentage of the mean.

As you can see on the benchmark results webpage, the VMI-Native
& Native scores for almost all workloads match within the 95%
confidence interval.  On the P4, only 4 workloads, all lmbench
microbenchmarks (forkproc,shproc,mmap,pagefault) were outside the
interval & the overheads (2%,1%,2%,1%, respectively) are low.
The opteron microbenchmark data was a little more ragged than
the P4 in terms of variance, but it appears that only a few
lmbench microbenchmarks (forkproc,execproc,shproc) were outside
their confidence intervals and they show low overheads (4%,3%,2%,
respectively); our in-house segv & divzero seemed to show
measureable overheads as well (8%,9%).

-Regards, Anne Holler (anne@xxxxxxxxxx)
2.6.16-rc6 Transparent Paravirtualization Performance Scoreboard
Updated: 03/20/2006 * Contact: Anne Holler (anne@xxxxxxxxxx)
[ITALICS -> the means being compared are within 95% confidence interval width]

Throughput benchmarks -> HIGHER IS BETTER -> Higher ratio is better
                     P4                  Opteron 
                     VMI-Native/Native   VMI-Native/Native   Comments
 Dbench
  1client            1.00 [312/311]      1.00 [425/425]
 Netperf
  Receive            1.00 [948/947]      1.00 [937/937]      CpuUtil:P4(VMI:43%,Ntv:42%);Opteron(VMI:36%,Ntv:34%)
  Send               1.00 [939/939]      1.00 [937/936]      CpuUtil:P4(VMI:25%,Ntv:25%);Opteron(VMI:62%,Ntv:60%)

Latency benchmarks -> LOWER IS BETTER -> Lower ratio is better
                     P4                  Opteron 
                     VMI-Native/Native   VMI-Native/Native   Comments
 Kernel compile
  UP                 1.00 [221/220]      1.00 [131/131]
  SMP/2way           1.00 [117/117]      1.00 [67/67]
 Lmbench process time latencies
  null call          1.00 [0.17/0.17]    1.00 [0.08/0.08]
  null i/o           1.00 [0.29/0.29]    0.92 [0.23/0.25]    opteron: wide confidence interval
  stat               0.99 [2.14/2.16]    0.94 [2.25/2.39]    opteron: odd, 1% outside wide confidence interval
  open clos          1.01 [3.00/2.96]    0.98 [3.16/3.24]
  slct TCP           1.00 [8.84/8.83]    0.94 [11.8/12.5]    opteron: wide confidence interval
  sig inst           0.99 [0.68/0.69]    1.09 [0.36/0.33]    opteron: best is 1.03 [0.34/0.33]
  sig hndl           0.99 [2.19/2.21]    1.05 [1.20/1.14]    opteron: best is 1.02 [1.13/1.11]
  fork proc          1.02 [137/134]      1.04 [100/96]
  exec proc          1.02 [536/525]      1.03 [309/301]
  sh proc            1.01 [3204/3169]    1.02 [1551/1528]
 Lmbench context switch time latencies
  2p/0K              1.00 [2.84/2.84]    1.14 [0.74/0.65]    opteron: wide confidence interval
  2p/16K             1.01 [2.98/2.95]    0.93 [0.74/0.80]    opteron: wide confidence interval
  2p/64K             1.02 [3.06/3.01]    1.00 [4.19/4.18]
  8p/16K             1.02 [3.31/3.26]    0.97 [1.86/1.91]
  8p/64K             1.01 [30.4/30.0]    1.00 [4.33/4.34]
  16p/16K            0.96 [7.76/8.06]    0.97 [2.03/2.10]
  16p/64K            1.00 [41.5/41.4]    1.00 [15.9/15.9]
 Lmbench system latencies
  Mmap               1.02 [6681/6542]    1.00 [3452/3441]
  Prot Fault         1.06 [0.920/0.872]  1.07 [0.197/0.184]  p4+opteron: wide confidence interval
  Page Fault         1.01 [2.065/2.050]  1.00 [1.10/1.10]
 Kernel Microbenchmarks
  getppid            1.00 [1.70/1.70]    1.00 [0.83/0.83]
  segv               0.99 [7.05/7.09]    1.08 [2.95/2.72]
  forkwaitn          1.02 [3.60/3.54]    1.05 [2.61/2.48]
  divzero            0.99 [5.68/5.73]    1.09 [2.71/2.48]

System Configurations:
 P4:      CPU: 2.4GHz; MEM: 1024MB; DISK: 10K SCSI; Server+Client NICs: Intel e1000 server adapter
 Opteron: CPU: 2.2Ghz; MEM: 1024MB; DISK: 10K SCSI; Server+Client NICs: Broadcom NetXtreme BCM5704
 UP kernel used for all workloads except SMP kernel compile

Benchmark Descriptions:
 Dbench: repeat N times until 95% confidence interval 5% around mean; report mean
  version 2.0 run as "time ./dbench -c client_plain.txt 1"
 Netperf: best of 5 runs
  MessageSize:8192+SocketSize:65536; netperf -H client-ip -l 60 -t TCP_STREAM
 Kernel compile: best of 3 runs
  Build of 2.6.11 kernel w/gcc 4.0.2 via "time make -j 16 bzImage"
 Lmbench: average of best 18 of 30 runs
  version 3.0-a4; obtained from sourceforge
 Kernel microbenchmarks: average of best 3 of 5 runs
  getppid: loop of 10 calls to getppid, repeated 1,000,000 times
  segv: signal of SIGSEGV, repeated 3,000,000 times
  forkwaitn: fork/wait for child to exit, repeated 40,000 times
  divzero: divide by 0 fault 3,000,000 times
2.6.16-rc6 Transparent Paravirtualization Performance Confidence Interval Widths
Updated: 03/20/2006 * Contact: Anne Holler (anne@xxxxxxxxxx)
Values are 95% confidence interval width around mean given in terms of percentage of mean
[BOLD -> confidence interval wider than 5% mean]

                   P4                  Opteron
                   Native VMI-Native   Native VMI-Native
 Dbench2.0
  1client            5.0%  1.4%          0.8%  3.6%
 Netperf
  Receive            0.1%  0.0%          0.0%  0.0%
  Send               0.6%  1.8%          0.0%  0.0%
 Kernel compile
  UP                 3.4%  2.6%          2.2%  0.0%
  SMP/2way           2.4%  4.9%          4.3%  4.2%
 Lmbench process time latencies
  null call          0.0%  0.0%          0.0%  0.0%
  null i/o           0.0%  0.0%          5.2% 10.8%
  stat               1.0%  1.0%          1.7%  3.2%
  open clos          1.3%  0.7%          2.4%  3.0%
  slct TCP           0.3%  0.3%         19.9% 20.1%
  sig inst           0.3%  0.5%          0.0%  5.5%
  sig hndl           0.4%  0.4%          2.0%  2.0%
  fork proc          0.5%  0.9%          0.8%  1.0%
  exec proc          0.8%  0.9%          1.0%  0.7%
  sh proc            0.1%  0.2%          0.9%  0.4%
 Lmbench context switch time latencies
  2p/0K              0.8%  1.8%         16.1%  9.9%
  2p/16K             1.5%  1.8%         10.5% 10.1%
  2p/64K             2.4%  3.0%          1.8%  1.4%
  8p/16K             4.5%  4.2%          2.4%  4.2%
  8p/64K             3.0%  2.8%          1.6%  1.5%
  16p/16K            3.1%  6.7%          2.6%  3.2%
  16p/64K            0.5%  0.5%          2.9%  2.9%
 Lmbench system latencies
  Mmap               0.7%  0.3%          2.2% 2.4%
  Prot Fault         7.4%  7.5%         49.4% 38.7%
  Page Fault         0.2%  0.2%          2.4%  2.9%
 Kernel Microbenchmarks
  getppid            1.7%  2.9%          3.5%  3.5%
  segv               2.3%  0.7%          1.8%  1.9%
  forkwaitn          0.8%  0.8%          5.3%  2.2%
  divzero            0.9%  1.3%          1.2%  1.1%
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.