[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen virtual IOMMU high level design doc



Hi Jan:
        Sorry for later response. Thanks a lot for your comments.

On 2016年08月25日 19:11, Jan Beulich wrote:
>>>> On 17.08.16 at 14:05, <tianyu.lan@xxxxxxxxx> wrote:
>> 1 Motivation for Xen vIOMMU
>> ============================================================================
>> ===
>> 1.1 Enable more than 255 vcpu support
>> HPC virtualization requires more than 255 vcpus support in a single VM
>> to meet parallel computing requirement. More than 255 vcpus support
>> requires interrupt remapping capability present on vIOMMU to deliver
>> interrupt to #vcpu >255 Otherwise Linux guest fails to boot up with >255
>> vcpus if interrupt remapping is absent.
> 
> I continue to question this as a valid motivation at this point in
> time, for the reasons Andrew has been explaining.

If we want to support Linux guest with >255 vcpus, interrupt remapping
is necessary.

From Linux commit introducing x2apic and IR mode, it said IR was
a pre-requisite for enabling x2apic mode in the CPU.
https://lwn.net/Articles/289881/

So far, no sure behavior on the other OS. We may watch Windows guest
behavior later on KVM and there is still a bug to run Windows guest with
IR function on KVM.


> 
>> 2. Xen vIOMMU Architecture
>> ============================================================================
>> ====
>>
>> * vIOMMU will be inside Xen hypervisor for following factors
>>      1) Avoid round trips between Qemu and Xen hypervisor
>>      2) Ease of integration with the rest of the hypervisor
>>      3) HVMlite/PVH doesn't use Qemu
>> * Dummy xen-vIOMMU in Qemu as a wrapper of new hypercall to create
>> /destory vIOMMU in hypervisor and deal with virtual PCI device's 2th
>> level translation.
> 
> How does the create/destroy part of this match up with 3) right
> ahead of it?

The create/destroy hypercalls will work for both hvm and hvmlite.
Suppose hvmlite has tool stack(E.G libxl) which can call new hypercalls
to create or destroy virtual iommu in hypervisor.

> 
>> 3 Xen hypervisor
>> ==========================================================================
>>
>> 3.1 New hypercall XEN_SYSCTL_viommu_op
>> 1) Definition of "struct xen_sysctl_viommu_op" as new hypercall parameter.
>>
>> struct xen_sysctl_viommu_op {
>>      u32 cmd;
>>      u32 domid;
>>      union {
>>              struct {
>>                      u32 capabilities;
>>              } query_capabilities;
>>              struct {
>>                      u32 capabilities;
>>                      u64 base_address;
>>              } create_iommu;
>>              struct {
>>                      u8  bus;
>>                      u8  devfn;
> 
> Please can we avoid introducing any new interfaces without segment/
> domain value, even if for now it'll be always zero?

Sure. Will add segment field.

> 
>>                      u64 iova;
>>                      u64 translated_addr;
>>                      u64 addr_mask; /* Translation page size */
>>                      IOMMUAccessFlags permisson;             
>>              } 2th_level_translation;
> 
> I suppose "translated_addr" is an output here, but for the following
> fields this already isn't clear. Please add IN and OUT annotations for
> clarity.
> 
> Also, may I suggest to name this "l2_translation"? (But there are
> other implementation specific things to be considered here, which
> I guess don't belong into a design doc discussion.)

How about this?
        struct {
            /* IN parameters. */
            u8  segment;
            u8  bus;
            u8  devfn;
            u64 iova;
            /* Out parameters. */
            u64 translated_addr;
            u64 addr_mask; /* Translation page size */
            IOMMUAccessFlags permisson;
        } l2_translation;

> 
>> };
>>
>> typedef enum {
>>      IOMMU_NONE = 0,
>>      IOMMU_RO   = 1,
>>      IOMMU_WO   = 2,
>>      IOMMU_RW   = 3,
>> } IOMMUAccessFlags;
>>
>>
>> Definition of VIOMMU subops:
>> #define XEN_SYSCTL_viommu_query_capability           0
>> #define XEN_SYSCTL_viommu_create                     1
>> #define XEN_SYSCTL_viommu_destroy                    2
>> #define XEN_SYSCTL_viommu_dma_translation_for_vpdev  3
>>
>> Definition of VIOMMU capabilities
>> #define XEN_VIOMMU_CAPABILITY_1nd_level_translation  (1 << 0)
>> #define XEN_VIOMMU_CAPABILITY_2nd_level_translation  (1 << 1)
> 
> l1 and l2 respectively again, please.

Will update.

> 
>> 3.3 Interrupt remapping
>> Interrupts from virtual devices and physical devices will be delivered
>> to vlapic from vIOAPIC and vMSI. It needs to add interrupt remapping
>> hooks in the vmsi_deliver() and ioapic_deliver() to find target vlapic
>> according interrupt remapping table. The following diagram shows the logic.
> 
> Missing diagram or stale sentence?

Sorry. It's stale sentence and moved the diagram to 2.2 Interrupt
remapping overview.

> 
>> 3.5 Implementation consideration
>> Linux Intel IOMMU driver will fail to be loaded without 2th level
>> translation support even if interrupt remapping and 1th level
>> translation are available. This means it's needed to enable 2th level
>> translation first before other functions.
> 
> Is there a reason for this? I.e. do they unconditionally need that
> functionality?

Yes, Linux intel IOMMU driver unconditionally needs l2 translation.
Driver checks whether there is a valid sagaw(supported Adjusted Guest
Address Widths) during initializing IOMMU data struct and return error
if not.

-- 
Best regards
Tianyu Lan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.