Xen project Mailing List

> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx] > Sent: Wednesday, October 19, 2016 4:27 AM > > > > 2) For physical PCI device > > DMA operations go though physical IOMMU directly and IO page table for > > IOVA->HPA should be loaded into physical IOMMU. When guest updates > > l2 Page-table pointer field, it provides IO page table for > > IOVA->GPA. vIOMMU needs to shadow l2 translation table, translate > > GPA->HPA and update shadow page table(IOVA->HPA) pointer to l2 > > Page-table pointer to context entry of physical IOMMU. > > > > Now all PCI devices in same hvm domain share one IO page table > > (GPA->HPA) in physical IOMMU driver of Xen. To support l2 > > translation of vIOMMU, IOMMU driver need to support multiple address > > spaces per device entry. Using existing IO page table(GPA->HPA) > > defaultly and switch to shadow IO page table(IOVA->HPA) when l2 > > defaultly? > > > translation function is enabled. These change will not affect current > > P2M logic. > > What happens if the guests IO page tables have incorrect values? > > For example the guest sets up the pagetables to cover some section > of HPA ranges (which are all good and permitted). But then during execution > the guest kernel decides to muck around with the pagetables and adds an HPA > range that is outside what the guest has been allocated. > > What then? Shadow PTE is controlled by hypervisor. Whatever IOVA->GPA mapping in guest PTE must be validated (IOVA->GPA->HPA) before updating into the shadow PTE. So regardless of when guest mucks its PTE, the operation is always trapped and validated. Why do you think there is a problem? Also guest only sees GPA. All it can operate is GPA ranges. > > > > 3.3 Interrupt remapping > > Interrupts from virtual devices and physical devices will be delivered > > to vlapic from vIOAPIC and vMSI. It needs to add interrupt remapping > > hooks in the vmsi_deliver() and ioapic_deliver() to find target vlapic > > according interrupt remapping table. > > > > > > 3.4 l1 translation > > When nested translation is enabled, any address generated by l1 > > translation is used as the input address for nesting with l2 > > translation. Physical IOMMU needs to enable both l1 and l2 translation > > in nested translation mode(GVA->GPA->HPA) for passthrough > > device. > > > > VT-d context entry points to guest l1 translation table which > > will be nest-translated by l2 translation table and so it > > can be directly linked to context entry of physical IOMMU. > > I think this means that the shared_ept will be disabled? > > > What about different versions of contexts? Say the V1 is exposed > to guest but the hardware supports V2? Are there any flags that have > swapped positions? Or is it pretty backwards compatible? yes, backward compatible. > > > > > > 3.5 Implementation consideration > > VT-d spec doesn't define a capability bit for the l2 translation. > > Architecturally there is no way to tell guest that l2 translation > > capability is not available. Linux Intel IOMMU driver thinks l2 > > translation is always available when VTD exits and fail to be loaded > > without l2 translation support even if interrupt remapping and l1 > > translation are available. So it needs to enable l2 translation first > > I am lost on that sentence. Are you saying that it tries to load > the IOVA and if they fail.. then it keeps on going? What is the result > of this? That you can't do IOVA (so can't use vfio ?) It's about VT-d capability. VT-d supports both 1st-level and 2nd-level translation, however only the 1st-level translation can be optionally reported through a capability bit. There is no capability bit to say a version doesn't support 2nd-level translation. The implication is that, as long as a vIOMMU is exposed, guest IOMMU driver always assumes IOVA capability available thru 2nd level translation. So we can first emulate a vIOMMU w/ only 2nd-level capability, and then extend it to support 1st-level and interrupt remapping, but cannot do the reverse direction. I think Tianyu's point is more to describe enabling sequence based on this fact. :-) > > 4.1 Qemu vIOMMU framework > > Qemu has a framework to create virtual IOMMU(e.g. virtual intel VTD and > > AMD IOMMU) and report in guest ACPI table. So for Xen side, a dummy > > xen-vIOMMU wrapper is required to connect with actual vIOMMU in Xen. > > Especially for l2 translation of virtual PCI device because > > emulations of virtual PCI devices are in the Qemu. Qemu's vIOMMU > > framework provides callback to deal with l2 translation when > > DMA operations of virtual PCI devices happen. > > You say AMD and Intel. This sounds quite OS agnostic. Does it mean you > could expose an vIOMMU to a guest and actually use the AMD IOMMU > in the hypervisor? Did you mean "expose an Intel vIOMMU to guest and then use physical AMD IOMMU in hypervisor"? I didn't think about this, but what's the value of doing so? :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.

Re: [Xen-devel] Xen virtual IOMMU high level design doc V2