[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)

tgingold@xxxxxxx wrote:
KVM has a pretty much optimal path from the kernel to userspace.  The
overhead of going to userspace is roughly two syscalls (and we've
measured this overhead).  Yet it makes almost no difference in IO
The path can be split into 2 parts: from trap to ioemu and from ioemu to
real hardware (the return is the same).  ioemu to hardware should be roughly
the same with KVM and Xen.  Is trap to ioemu that different between Xen and

Yup. With KVM, there is no scheduler involvement. qemu does a blocking ioctl to the Linux kernel, and the Linux kernel does a vmrun. Provided the time slice hasn't been exhausted, Linux returns directly to qemu after a vmexit.

Xen uses event channels which involved domain switches and select()'ing. A lot of the time, the path is pretty optimal. However, quite a bit of the time, you run into worst case scenarios with the various schedulers and the latency sky rockets.

Honestly I don't know.  Does anyone have figures ?

Yeah, it varies a lot on different hardware.  For reference:

if round trip to a null int80 syscall is 150 nsec, a round trip vmexit to userspace in KVM may be 2500 nsec. On bare metal, it may cost 1700 nsec to do a PIO operation to a IDE port so 2500 really isn't that bad.

Xen is usually around there too but every so often, it spikes to something awful (100ks of nsecs) and that skews the average cost.

It would be interesting to compare disk (or net) performances between:
* linux
* dom0
* driver domain
* PV-on-HVM drivers
* ioemu

Does such a comparaison exist ?

Not that I know of.  I've done a lot of benchmarking but not of PV-on-HVM.

Xen can typically get pretty close to native for disk IO.

The big problem with disk emulation isn't IO latency, but the fact that
the IDE emulation can only have one outstanding request at a time.  The
SCSI emulation helps this a lot.
IIRC, a real IDE can only have one outstanding request too (this may have
changed with AHCI).  This is really IIRC :-(

You recall correctly. IDE can only have one type of outstanding DMA request.

BTW on ia64 there is no REP IN/OUT.  When Windows use IDE in PIO mode (during
install and crash dump), performances are horrible.  There is a patch which
adds a special handling for PIO mode and really improve data rate.

Ouch :-(  Fortunately, OS's won't use PIO very often.

I don't know what the bottle neck is in network emulation, but I suspect
the number of copies we have in the path has a great deal to do with it.
This reason seems obvious.

There's a lot to like about this sort of approach.  It's not a silver
bullet wrt performance but I think the model is elegant in many ways.
An interesting place to start would be lapic/pit emulation.  Removing
this code from the hypervisor would be pretty useful and there is no
need to address PV-on-HVM issues.
Indeed this is the simpler code to move.  But why would it be useful ?

Removing code from the hypervisor reduces the TCB so it's a win. Having it in firmware within the HVM domain is even better than having it in dom0 too wrt the TCB.

Can you provide more details on how the reflecting works?  Have you
measured the cost of reflection?  Do you just setup a page table that
maps physical memory 1-1 and then reenter the guest?
Yes, set disable PG, set up flat mode and reenter the guest.
Cost not yet measured!

That would be very useful to measure. My chief concern would be that disabling PG would be considerably more costly than entering with paging enabled. That may not be the case on VT today since there is no ASIDs so it would be useful to test on SVM too.

Does the firmware get loaded as an option ROM or is it a special portion
of guest memory that isn't normally reachable?
IMHO it should come with hvmload.  No needs to make it unreachable.

It would be nice to get rid of hvmloader in the long term IMHO. Any initialization should be done in the BIOS.


Anthony Liguori


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.