[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen virtual IOMMU high level design doc



>>> On 17.08.16 at 14:05, <tianyu.lan@xxxxxxxxx> wrote:
> 1 Motivation for Xen vIOMMU
> ============================================================================
> ===
> 1.1 Enable more than 255 vcpu support
> HPC virtualization requires more than 255 vcpus support in a single VM
> to meet parallel computing requirement. More than 255 vcpus support
> requires interrupt remapping capability present on vIOMMU to deliver
> interrupt to #vcpu >255 Otherwise Linux guest fails to boot up with >255
> vcpus if interrupt remapping is absent.

I continue to question this as a valid motivation at this point in
time, for the reasons Andrew has been explaining.

> 2. Xen vIOMMU Architecture
> ============================================================================
> ====
> 
> * vIOMMU will be inside Xen hypervisor for following factors
>       1) Avoid round trips between Qemu and Xen hypervisor
>       2) Ease of integration with the rest of the hypervisor
>       3) HVMlite/PVH doesn't use Qemu
> * Dummy xen-vIOMMU in Qemu as a wrapper of new hypercall to create
> /destory vIOMMU in hypervisor and deal with virtual PCI device's 2th
> level translation.

How does the create/destroy part of this match up with 3) right
ahead of it?

> 3 Xen hypervisor
> ==========================================================================
> 
> 3.1 New hypercall XEN_SYSCTL_viommu_op
> 1) Definition of "struct xen_sysctl_viommu_op" as new hypercall parameter.
> 
> struct xen_sysctl_viommu_op {
>       u32 cmd;
>       u32 domid;
>       union {
>               struct {
>                       u32 capabilities;
>               } query_capabilities;
>               struct {
>                       u32 capabilities;
>                       u64 base_address;
>               } create_iommu;
>               struct {
>                       u8  bus;
>                       u8  devfn;

Please can we avoid introducing any new interfaces without segment/
domain value, even if for now it'll be always zero?

>                       u64 iova;
>                       u64 translated_addr;
>                       u64 addr_mask; /* Translation page size */
>                       IOMMUAccessFlags permisson;             
>               } 2th_level_translation;

I suppose "translated_addr" is an output here, but for the following
fields this already isn't clear. Please add IN and OUT annotations for
clarity.

Also, may I suggest to name this "l2_translation"? (But there are
other implementation specific things to be considered here, which
I guess don't belong into a design doc discussion.)

> };
> 
> typedef enum {
>       IOMMU_NONE = 0,
>       IOMMU_RO   = 1,
>       IOMMU_WO   = 2,
>       IOMMU_RW   = 3,
> } IOMMUAccessFlags;
> 
> 
> Definition of VIOMMU subops:
> #define XEN_SYSCTL_viommu_query_capability            0
> #define XEN_SYSCTL_viommu_create                      1
> #define XEN_SYSCTL_viommu_destroy                     2
> #define XEN_SYSCTL_viommu_dma_translation_for_vpdev   3
> 
> Definition of VIOMMU capabilities
> #define XEN_VIOMMU_CAPABILITY_1nd_level_translation   (1 << 0)
> #define XEN_VIOMMU_CAPABILITY_2nd_level_translation   (1 << 1)

l1 and l2 respectively again, please.

> 3.3 Interrupt remapping
> Interrupts from virtual devices and physical devices will be delivered
> to vlapic from vIOAPIC and vMSI. It needs to add interrupt remapping
> hooks in the vmsi_deliver() and ioapic_deliver() to find target vlapic
> according interrupt remapping table. The following diagram shows the logic.

Missing diagram or stale sentence?

> 3.5 Implementation consideration
> Linux Intel IOMMU driver will fail to be loaded without 2th level
> translation support even if interrupt remapping and 1th level
> translation are available. This means it's needed to enable 2th level
> translation first before other functions.

Is there a reason for this? I.e. do they unconditionally need that
functionality?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.