[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-users] Xen performance


  • To: anorton@xxxxxxx, xen-users@xxxxxxxxxxxxxxxxxxx
  • From: "Petersson, Mats" <mats.petersson@xxxxxxx>
  • Date: Wed, 12 Oct 2005 10:57:59 +0200
  • Delivery-date: Wed, 12 Oct 2005 08:57:28 +0000
  • List-id: Xen user discussion <xen-users.lists.xensource.com>
  • Thread-index: AcXOpYSbw2+ibV9eRZKCw6sK2oZHowAYpyVQ
  • Thread-topic: [Xen-users] Xen performance

Angela,
 
I'm not sure what you EXPECTED to see. A virtual machine will always be (somewhat) slower than the "real" hardware, because you have an extra software layer for some operations. Basicly, this is the price you pay for the extended system functionality that you get. It's the same as saying "If I remove the file-system from my Operating System, I can read from or write to the disk much quicker than going through the file-system"[1]. You gain some functionality, and you loose in performance.
 
Comments on Byte-Bench:
I can't explain the pipe throughput, because I just don't know anything about how that works.
 
Process creation involves a lot of page-table work, which is definitely a typical situation where the hypervisor (Xen) has to take extra action on top of what's normally done in the OS, as each operations that normally are trivial writes to a page-table entry now has become a call into Xen to perform the "trivial operation". So instead of a few simple operations, we now have a software interrupt, a function call and several extra operations just to find out what needs to be done, then the actual page-table update. I expect this to be an order of magnitude slower than the native operations.
 
My guess is that shell-scripts isn't slower in themselves, but that there are several new processes created within the shell-script.
 
 
Comments on lmbench:
read & write are slower - no big surprise, it's most likely that the read & write's go to a file, which commonly is emulated through the loopback 
mounted file that is the DomU's "disk". So you get twice the amount of reads, one in Dom0 reading the disk image, and the data is then transferred to DomU through a "read" operation.
 
Similar for any of the other file-related operations, they become two-step operations, with Dom0 doing the actual work and then transferring the result to DomU.
 
Protection fault handling go through extra steps, as the code enters Xen itself and then has to be passed back to the actual guest that prot-faulted, so it's expected that those take longer than the same operation in native OS.
 
I still have no explanation for the pipe behaviour - in the few minutes I've been working on this answer, I haven't learnt how pipes work ;-)
 
Sockets, probably related to pipes... But I have no real idea how pipes or sockets work...
 
fork+<something>: More work needed in virtual machine than in the real hardware, as described on process creation above. A factor ~2x slower isn't bad at all... Some of these operations also involve file operations, which adds to the already slower operation.
 
[1] This assumes the file-system is relatively stupid in caching things, because a modern file-system performs a lot of clever caching/optimisation to increase the system performance.
 
--
Mats
 


From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Angela Norton
Sent: 11 October 2005 21:51
To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Xen performance

Hi all,
While doing some benchmarking of Xen, I ran across a couple performance issues. I am wondering if anyone else has noticed this and whether there is anything I can do to tune the performance.

The setup:
CPU: Athlon XP 2500+ (1826.005 MHz)
RAM: Limited to 256 MB in native and xenU
Disk:Maxtor 6B200P0, ATA DISK drive
Motherboard: ASUS A7VBX-MX SE
Network: tested only loopback interface.

I have Fedora Core 4 installed as dom0, with Scientific Linux 3.0.7 (RHEL3) installed on a separate partition as the single domU. I installed the FC4 xen rpms (xen-3.0-0.20050912.fc4, kernel-xenU-2.6.12-1.1454_FC4, kernel-xen0-2.6.12-1.1454_FC4) using yum.

I used the following benchmark tools/suites:
bonnie++-1.03a
UnixBench 4.1.0
ab
lmbench 3.0-a5

The areas where I saw the greatest performance hit were in system calls, process creation, and pipe throughput. Here are some selected results:

UnixBench:
============

Scientific Linux 3 Native:
  BYTE UNIX Benchmarks (Version 4.1.0)
  System -- Linux localhost.localdomain 2.4.21-27.0.2.EL #1 Tue Jan 18 20:27:31 CST 2005 i686 athlon i386 GNU/
Linux
  Start Benchmark Run: Thu Sep 22 15:23:17 PDT 2005
   2 interactive users.
   15:23:17  up 12 min,  2 users,  load average: 0.03, 0.08, 0.05
  lrwxr-xr-x    1 root     root            4 Sep  9 10:56 /bin/sh -> bash
  /bin/sh: symbolic link to bash
  /dev/hdc11            20161172   5059592  14077440  27% /
<--snip-->
System Call Overhead                     995605.1 lps   (10.0 secs, 10 samples)
Pipe Throughput                          1135376.3 lps   (10.0 secs, 10 samples)
Pipe-based Context Switching             375521.7 lps   (10.0 secs, 10 samples)
Process Creation                           9476.4 lps   (30.0 secs, 3 samples)
Execl Throughput                           2918.3 lps   (29.7 secs, 3 samples)
<--snip-->
                     INDEX VALUES
TEST                                        BASELINE     RESULT      INDEX

Dhrystone 2 using register variables        116700.0  4307104.5      369.1
Double-Precision Whetstone                      55.0      980.4      178.3
Execl Throughput                                43.0     2918.3      678.7
File Copy 1024 bufsize 2000 maxblocks         3960.0   143780.0      363.1
File Copy 256 bufsize 500 maxblocks           1655.0    72156.0      436.0
File Copy 4096 bufsize 8000 maxblocks         5800.0   192427.0      331.8
Pipe Throughput                              12440.0  1135376.3      912.7
Process Creation                               126.0     9476.4      752.1
Shell Scripts (8 concurrent)                     6.0      329.7      549.5
System Call Overhead                         15000.0   995605.1      663.7
                                                                 =========
     FINAL SCORE                                                     475.2

--------------------------------------------

SL3 XenU
  BYTE UNIX Benchmarks (Version 4.1.0)
  System -- Linux localhost.localdomain 2.6.12-1.1454_FC4xenU #1 SMP Fri Sep 9 00:45:34 EDT 2005 i686 athlon i386 GNU/Linux
  Start Benchmark Run: Fri Sep 23 09:08:23 PDT 2005
   1 interactive users.
   09:08:23  up 0 min,  1 user,  load average: 0.95, 0.25, 0.08
  lrwxr-xr-x    1 root     root            4 Sep  9 10:56 /bin/sh -> bash
  /bin/sh: symbolic link to bash
  /dev/sda1             20161172   5058964  14078068  27% /
<--snip-->
System Call Overhead                     969225.3 lps   (10.0 secs, 10 samples)
Pipe Throughput                          619270.7 lps   (10.0 secs, 10 samples)
Pipe-based Context Switching              85183.9 lps   (10.0 secs, 10 samples)
Process Creation                           3014.6 lps   (30.0 secs, 3 samples)
Execl Throughput                           1807.4 lps   (29.9 secs, 3 samples)
<--snip-->
                     INDEX VALUES           
TEST                                        BASELINE     RESULT      INDEX

Dhrystone 2 using register variables        116700.0  4288647.9      367.5
Double-Precision Whetstone                      55.0      976.3      177.5
Execl Throughput                                43.0     1807.4      420.3
File Copy 1024 bufsize 2000 maxblocks         3960.0   143559.0      362.5
File Copy 256 bufsize 500 maxblocks           1655.0    70328.0      424.9
File Copy 4096 bufsize 8000 maxblocks         5800.0   186297.0      321.2
Pipe Throughput                              12440.0   619270.7      497.8
Process Creation                               126.0     3014.6      239.3
Shell Scripts (8 concurrent)                     6.0      188.0      313.3
System Call Overhead                         15000.0   969225.3      646.2
                                                                 =========
     FINAL SCORE                                                     356.0

---------------------------------------------------------------------------------

lmbench Selected Results:
==========================

SL3 Native:
<--snip-->
Simple syscall: 0.1516 microseconds
Simple read: 0.2147 microseconds
Simple write: 0.1817 microseconds
Simple stat: 1.8486 microseconds
Simple fstat: 0.3026 microseconds
Simple open/close: 2.2201 microseconds
<--snip-->
Protection fault: 0.2196 microseconds
Pipe latency: 2.2539 microseconds
AF_UNIX sock stream latency: 4.8221 microseconds
Process fork+exit: 143.7297 microseconds
Process fork+execve: 483.0833 microseconds
Process fork+/bin/sh -c: 1884.0000 microseconds

-------------------------------------------------

SL3 XenU:
<--snip-->
Simple syscall: 0.1671 microseconds
Simple read: 0.4090 microseconds
Simple write: 0.3588 microseconds
Simple stat: 3.5761 microseconds
Simple fstat: 0.5530 microseconds
Simple open/close: 3.9425 microseconds
<--snip-->
Protection fault: 0.5993 microseconds
Pipe latency: 12.1886 microseconds
AF_UNIX sock stream latency: 22.3485 microseconds
Process fork+exit: 365.8667 microseconds
Process fork+execve: 1066.4000 microseconds
Process fork+/bin/sh -c: 3826.0000 microseconds
<--snip-->

-------------------------------------------------------------------------



I can post the full results of these tests if anyone is interested.

Does anyone have any ideas for tuning the performance of the domUs? Are there any configurations that perform better than others?

Thank You,
Angela Norton
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.