[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [Xenhackthon] Virtualized APIC registers - virtual interrupt delivery.



During the hackothon we chatted about the Intel APIC
virtualization and how it works with current Linux PVHVM.
Or rather how it is not per my understanding. I am trying
to visualize how this would work with a 10GB NIC that is passed
in a guest.

This slide (starting at pg 6) gives an idea of what it is:
http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-virt-intel-vt-feat-nakajima.pdf

pg9 goes in details of what this does - APIC reads don't
trap, but writes do cause an VMEXIT. OK, that is similar to how
PVHVM callback vector + events work. If the NIC has an vector, it
hits the hypervisor, sets the right event channel. Then the guest
is interrupted with vector 0xf3 (callback vector), goes straight to
__xen_evtchn_do_upcall and reads the event channel and calls the NIC
driver IRQ handler. If it needs to write (say do an IPI or mask an
other CPU IRQ) it will do a hypercall and exit (there optimizations
to not do this if the masking, etc is done on the local CPU).
For that the PVHVM event channel machinery gives the same benefit.

The next part is "Virtual-interrupt delivery". Here it says:
"CPU delivers virtual interrupts to guest (including virtual IPIs)."

Not much on details, but then this slide:
http://www.linux-kvm.org/wiki/images/7/70/2012-forum-nakajima_apicv.pdf
gives a better idea (page 7 and 8) and then it goes in details. Also the
Intel Software Development Manual starting at 29.1 talks in details about
it.

Per my understanding, the CPU sets the SVI and RVI to tell the hypervisor
what vector is currently being execututed and which one is going next.
Those vectors are choosen by the OS. It could use vector 0xfa for
a NIC driver and a lower one for IPIs or such.

The hypervisor sets a VISR (a bitmap) off all the vectors that a guest
is allowed to execute without an VMEXIT. In all likehood it will just
mask out the vectors it is using and let the guest have a free range.

Which means that if this is set to be higher than the hypervisor timer
or IPI callback the guest can run unbounded. Also it would seem that
this value has to be often reset when migrating a guest between the pCPUs.
And it would appear that this value is static. Meaning the guest only
sets these vectors once and the hypervisor is responsible for managing
the priority of that guest and other guests (say dom0) on the CPU.

For example, we have a guest with a 10gB NIC and the guest has decided
to use vector 0x80 for it (assume a UP guest). Dom0 has an SAS controller
and is using event number 30, 31, 32, and 33 (there are only 4 PCPUS).
The hypervisor maps them to be 0x58, 0x68, 0x78 and 0x88 and spreads those
vectors on each pCPU. The guest is running on pCPU1 and there are two
vectors - 0x80 and 0x58. The one assigned to the guest wins and dom0
SAS controller is preempted.

The solution for that seems to have some interaction with the
guest when it allocates the vectors so that they are always below
the dom0 priority vectors. Or hypervisor has to dynamically shuffle its
own vectors to be higher priority.

Or is there an guest vector <-> hypervisor vector lookup table that
the CPU can use? So the hypervisor can say: the vector 0x80 in the
guest actually maps to vector 0x48 in the hypervisor?

Now the above example assumed a simple HVM Linux kernel that does not
use PV extensions. Currently Linux on HVM will enable the event
system and use one vector for a callback (0xf3). For this to work
where we mix the event callback and a real physical device vector
along with access to the virtual APIC, this would require some
knowing of which devices (or vectors) can use the event path
or not.

Am I on the right track?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.