[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc



On Fri, Feb 09, 2018 at 12:54:11PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:08PM +0800, Chao Gao wrote:
>> From: Lan Tianyu <tianyu.lan@xxxxxxxxx>
>> 
>> This patch is to add Xen virtual IOMMU doc to introduce motivation,
>> framework, vIOMMU hypercall and xl configuration.
>> 
>> Signed-off-by: Lan Tianyu <tianyu.lan@xxxxxxxxx>
>> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
>> ---
>>  docs/misc/viommu.txt | 120 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 120 insertions(+)
>>  create mode 100644 docs/misc/viommu.txt
>> 
>> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
>> new file mode 100644
>> index 0000000..472d2b5
>> --- /dev/null
>> +++ b/docs/misc/viommu.txt
>> @@ -0,0 +1,120 @@
>> +Xen virtual IOMMU
>> +
>> +Motivation
>> +==========
>> +Enable more than 128 vcpu support
>> +
>> +The current requirements of HPC cloud service requires VM with a high
>> +number of CPUs in order to achieve high performance in parallel
>> +computing.
>> +
>> +To support >128 vcpus, X2APIC mode in guest is necessary because legacy
>> +APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
>> +CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
>> +in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
>> +supports 32-bit APIC ID and it requires the interrupt remapping 
>> functionality
>> +of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
>> +
>> +PCI MSI/IOAPIC can only send interrupt message containing 8-bit APIC ID,
>> +which cannot address cpus with >254 APIC ID. Interrupt remapping supports
>> +32-bit APIC ID and so it's necessary for >128 vcpus support.
>> +
>> +vIOMMU Architecture
>> +===================
>> +vIOMMU device model is inside Xen hypervisor for following factors
>> +    1) Avoid round trips between Qemu and Xen hypervisor
>> +    2) Ease of integration with the rest of hypervisor
>> +    3) PVH doesn't use Qemu
>> +
>> +* Interrupt remapping overview.
>> +Interrupts from virtual devices and physical devices are delivered
>> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
>> +this procedure.
>> +
>> ++---------------------------------------------------+
>> +|Qemu                       |VM                     |
>> +|                           | +----------------+    |
>> +|                           | |  Device driver |    |
>> +|                           | +--------+-------+    |
>> +|                           |          ^            |
>> +|       +----------------+  | +--------+-------+    |
>> +|       | Virtual device |  | |  IRQ subsystem |    |
>> +|       +-------+--------+  | +--------+-------+    |
>> +|               |           |          ^            |
>> +|               |           |          |            |
>> ++---------------------------+-----------------------+
>> +|hypervisor     |                      | VIRQ       |
>> +|               |            +---------+--------+   |
>> +|               |            |      vLAPIC      |   |
>> +|               |VIRQ        +---------+--------+   |
>> +|               |                      ^            |
>> +|               |                      |            |
>> +|               |            +---------+--------+   |
>> +|               |            |      vIOMMU      |   |
>> +|               |            +---------+--------+   |
>> +|               |                      ^            |
>> +|               |                      |            |
>> +|               |            +---------+--------+   |
>> +|               |            |   vIOAPIC/vMSI   |   |
>> +|               |            +----+----+--------+   |
>> +|               |                 ^    ^            |
>> +|               +-----------------+    |            |
>> +|                                      |            |
>> ++---------------------------------------------------+
>> +HW                                     |IRQ
>> +                                +-------------------+
>> +                                |   PCI Device      |
>> +                                +-------------------+
>> +
>> +
>> +vIOMMU hypercall
>> +================
>> +Introduce a new domctl hypercall "xen_domctl_viommu_op" to create
>> +vIOMMUs instance in hypervisor. vIOMMU instance will be destroyed
>> +during destroying domain.
>> +
>> +* vIOMMU hypercall parameter structure
>> +
>> +/* vIOMMU type - specify vendor vIOMMU device model */
>> +#define VIOMMU_TYPE_INTEL_VTD              0
>> +
>> +/* vIOMMU capabilities */
>> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
>> +
>> +struct xen_domctl_viommu_op {
>> +    uint32_t cmd;
>> +#define XEN_DOMCTL_viommu_create          0
>> +    union {
>> +        struct {
>> +            /* IN - vIOMMU type  */
>> +            uint8_t type;
>> +            /* IN - MMIO base address of vIOMMU. */
>> +            uint64_t base_address;
>> +            /* IN - Capabilities with which we want to create */
>> +            uint64_t capabilities;
>> +            /* OUT - vIOMMU identity */
>> +            uint32_t id;
>> +        } create;
>> +    } u;
>> +};
>> +
>> +- XEN_DOMCTL_create_viommu
>> +    Create vIOMMU device with type, capabilities and MMIO base address.
>> +Hypervisor allocates viommu_id for new vIOMMU instance and return back.
>> +The vIOMMU device model in hypervisor should check whether it can
>> +support the input capabilities and return error if not.
>> +
>> +vIOMMU domctl and vIOMMU option in configure file consider multi-vIOMMU
>> +support for single VM.(e.g, parameters of create vIOMMU includes vIOMMU id).
>> +But function implementation only supports one vIOMMU per VM so far.
>> +
>> +xl x86 vIOMMU configuration"
>> +============================
>> +viommu = [
>> +    'type=intel_vtd,intremap=1',
>> +    ...
>> +]
>> +
>> +"type" - Specify vIOMMU device model type. Currently only supports Intel vtd
>> +device model.
>
>Although I see the point in being able to specify the vIOMMU type, is
>this really helpful from an admin PoV?
>
>What would happen for example if you try to add an Intel vIOMMU to a
>guest running on an AMD CPU? I guess the guest OSes would be quite
>surprised about that...
>
>I think the most common way to use this option would be:
>
>viommu = [
>    'intremap=1',
>    ...
>]

Agree it.

>
>And vIOMMUs should automatically be added to guests with > 128 vCPUs?
>IIRC Linux requires a vIOMMU in order to run with > 128 vCPUs (which
>is quite arbitrary, but anyway...).

I think linux will only use 128 CPUs for this case on bare-metal.
Considering a benign VM shouldn't has a weird configuration -- has > 128
vcpus but has no viommu, adding vIOMMUs automatically when needed is
fine with me.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.