[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] Quick path for PIO instructions which cut more than half of the expense

  • To: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Xiang, Kai" <kai.xiang@xxxxxxxxx>
  • Date: Mon, 22 Dec 2008 18:16:14 +0800
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • Delivery-date: Mon, 22 Dec 2008 02:16:42 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AclkHlHyULdzldIRRfyHPzvdmtg6NA==
  • Thread-topic: [PATCH] Quick path for PIO instructions which cut more than half of the expense

Hi all:

Happy Holidays to you :)

We found the PIO instruction path is changed in the Xen 3.3 tree compare to 
earlier Xen 3.1 tree.
We suspect this will put more burdens for Xen itself, which hurt the 

This patch is worked out to address this issue (c/s: 18933), which gives a 
short path for none-string PIO.
To demonstrate how much performance influence this could bring in, we have 
experiment/data as below:

1) Direct TSC Data from Xentrace
We use a small piece of code to read port repetitively which runs in a RHEL5 
guest. And collect the xentrace data at the same time.
We see the TSC from VMEXIT to blocked_to_runnable (This could be viewed as one 
indicator for code path handling VMEXIT inside) is ~59% cut off (from 2616 to 

2) Port IO TSC observed from this piece of code
This includes the response from the QEMU side. While we can also get about 
~18% TSC reduced for one simple PIO (From 16112 to 13296)

3) The influence for more realistic workloads:
We tested on Windows 2003 Server Guest, while using IOmeter to run a Disk bound 
test, the IO pattern is "Default" which use 67% random read and 33% random 
write with 2K request size.
To reduce the influence of file cache, I run 3 times (1 minutes each) from the 
start of the computer (both xen and the guest)

Compare before and after
         IO per second (3 runs)    |  average response time (3 runs)
Before: 100.004; 109.447; 110.801  |  9.988;  9.133;  9.022
After:  101.951; 110.893; 114.179  |  9.806;  9.016;  8.756

So we are having a 1%~3% percent IO performance gain while reduce the average 
response time by 2%~3% at the same time.
Considering this is just an ordinary SATA disk and an IO bound workload, we are 
expecting more with faster Disks and more cached IO. 

&BTW: And I also fix one wrong comments in the patch.

Looking forwards your feedback,
Thanks in advantages. 

Best wishes


We attached the piece of test code in the attachments also:

And the configurations as below:
Intel(r) Supermicro Tylersburg-EP Server System 
CPU Info: 
2x Quad-core processor 2.8GHZ with 8MB L3 Cache (Nehalem)
Disk: Seagate SATA 500G
Memory Info
12GB memory (12 x 1GB DDR3 1066MHZ) 
Guest configurations:
Memory: 512MB
Device model: Stub domain
IO meter test status:
Two visual disks used:
hda for system and hdb for test hard disk for IO meter.

Attachment: pio_quickpath.patch
Description: pio_quickpath.patch

Attachment: pio.c
Description: pio.c

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.