[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)

Hi Tristan,

Thanks for posting this.

Tristan Gingold wrote:
Summary: I am proposing  a new method to improve hvm IO emulation: the IO
requests are reflected to the domain firmware which emulates the IO using PV
drivers.  The pros of this method are minor hypervisor modifications, smooth
transition, performance improvement and convergence with PV model


The current IO emulator (ioemu process in dom-0) is a well known bottleneck
for hvm performance because IO requests travel is long and cross many rings.

I'm not quite sure that I agree this is the bottleneck. If IO latency were the problem, then a major reduction in IO latency ought to significantly improve performance right?

KVM has a pretty much optimal path from the kernel to userspace. The overhead of going to userspace is roughly two syscalls (and we've measured this overhead). Yet it makes almost no difference in IO throughput.

The big problem with disk emulation isn't IO latency, but the fact that the IDE emulation can only have one outstanding request at a time. The SCSI emulation helps this a lot.

I don't know what the bottle neck is in network emulation, but I suspect the number of copies we have in the path has a great deal to do with it.

Many ideas to improve the emulation have been proposed.  None of them have
been adopted because their approach are too disruptive.

Based on my recent firmware experience I'd like to propose a new method.

The principle is rather simple: the hvm domain does all the work.  IO requests
are simply reflected to the domain.  When the hypervisor decodes an IO
request it sends it to the domain using a SMI(x86)/PMI(ia64)-like
interruption.  This reflection saves some registers, put parameters (IO req)
into registers and call the firmware at defined address using a defined mode
(physical mode should be the best).  The firmware handles the IO request like
ioemu does but use PV drivers (net, blk, fb...) to access to external
resources.  It then resumes the domain execution through an hypercall which
restores registers and mode.

I think there are many pros to this approach:

* the changes in the hypervisor are rather small: only the code to do the
reflection has to be added.  This is a well-known and light mechanism.

* the transition can be smooth: this new method can co-exist in several way
with the current method.  First it can be used only when enabled.  Then once
the reflection code is added in the hypervisor the firmware can just send the
IO request to ioemu like the hypervisor already does.  The in domain IO
emulation can be added driver per driver (eg: IDE disk first, then network,
then fb).
This smooth transition is a major advantage to early evaluate this new method.

* Because all the emulation work is done in the domain the work in accounted
to this domain and not to another domain (dom0 today).  This is good for
management and for security.

* From the hypervisor point of view such an hvm domain looks like a PV domain:
only the creation differs.  This IO emulation method unifies the domain.  This
will simplify save & restore and Xen in general.

* Performance should be improved compared to the current io emulation method:
the IO request travel is shorter.  If we want to work on performance we could
later handle directly some IO requests in the hypervisor (I think of ports or
iomem which don't have side-effect).

I don't see a lot of cons, the major one is 'porting' ioemu code to
firmware code.  This is the challenge.  But qemu seems to be well structured.
Most of the files might be ported without changes, the core has of course to
be rewritten.  The PV drivers should also be ported.

SMP can be first handled with a global lock and later concurrent accesses may
be allowed.  This may improve performance compared to ioemu which is almost
single threaded.

There's a lot to like about this sort of approach. It's not a silver bullet wrt performance but I think the model is elegant in many ways. An interesting place to start would be lapic/pit emulation. Removing this code from the hypervisor would be pretty useful and there is no need to address PV-on-HVM issues.

Can you provide more details on how the reflecting works? Have you measured the cost of reflection? Do you just setup a page table that maps physical memory 1-1 and then reenter the guest?

Does the firmware get loaded as an option ROM or is it a special portion of guest memory that isn't normally reachable?


Anthony Liguori

I don't know yet how to use the PV-on-HVM drivers.  There is currently only
one page to communicate with xenstore.  We can try to share this page
between the firmware and the PV-on-HVM drivers or we may create a second

I have thought of this new IO emulation method during my work on EFI gfw for
ia64.  Recently I have looked more deeply into the sources.  I can't see any
stopper yet.  Unless someone has a strong point against this method I hope
I will be able to work on it shortly (ia64 first - sorry!)

Comments are *very* welcome.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.