[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices
- To: Rahul Singh <Rahul.Singh@xxxxxxx>
- From: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>
- Date: Fri, 28 Oct 2022 18:26:18 +0300
- Cc: Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Xen developer discussion <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <Michal.Orzel@xxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
- Delivery-date: Fri, 28 Oct 2022 15:26:51 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
Hi Oleksandr,
Hello Rahul
[sorry for the possible format issues]
> On 26 Oct 2022, at 7:23 pm, Oleksandr Tyshchenko <olekstysh@xxxxxxxxx> wrote:
>
>
>
> On Wed, Oct 26, 2022 at 8:18 PM Michal Orzel <michal.orzel@xxxxxxx> wrote:
> Hi Rahul,
>
>
> Hello all
>
> [sorry for the possible format issues]
>
>
> On 26/10/2022 16:33, Rahul Singh wrote:
> >
> >
> > Hi Julien,
> >
> >> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xxxxxxx> wrote:
> >>
> >>
> >>
> >> On 26/10/2022 14:17, Rahul Singh wrote:
> >>> Hi All,
> >>
> >> Hi Rahul,
> >>
> >>> At Arm, we started to implement the POC to support 2 levels of page tables/nested translation in SMMUv3.
> >>> To support nested translation for guest OS Xen needs to expose the virtual IOMMU. If we passthrough the
> >>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for the guest there is a need to
> >>> add IOMMU binding for the device in the passthrough node as per [1]. This email is to get an agreement on
> >>> how to add the IOMMU binding for guest OS.
> >>> Before I will explain how to add the IOMMU binding let me give a brief overview of how we will add support for virtual
> >>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested translation support. SMMUv3 hardware
> >>> supports two stages of translation. Each stage of translation can be independently enabled. An incoming address is logically
> >>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is
> >>> intended to be used by a software entity( Guest OS) to provide isolation or translation to buffers within the entity, for example,
> >>> DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is
> >>> intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation
> >>> configuration is called nesting.
> >>> Stage 1 translation support is required to provide isolation between different devices within the guest OS. XEN already supports
> >>> Stage 2 translation but there is no support for Stage 1 translation for guests. We will add support for guests to configure
> >>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware and exposes the virtual SMMU to the guest.
> >>> Guest can use the native SMMU driver to configure the stage 1 translation. When the guest configures the SMMU for Stage 1,
> >>> XEN will trap the access and configure the hardware accordingly.
> >>> Now back to the question of how we can add the IOMMU binding between the virtual IOMMU and the master devices so that
> >>> guests can configure the IOMMU correctly. The solution that I am suggesting is as below:
> >>> For dom0, while handling the DT node(handle_node()) Xen will replace the phandle in the "iommus" property with the virtual
> >>> IOMMU node phandle.
> >> Below, you said that each IOMMUs may have a different ID space. So shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the user to specify the mapping?
> >
> > Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also helps in the ACPI case
> > where we don’t need to modify the tables to delete the pIOMMU entries and create one vIOMMU.
> > In this case, no need to replace the phandle as Xen create the vIOMMU with the same pIOMMU
> > phandle and same base address.
> >
> > For domU guests one vIOMMU per guest will be created.
> >
> >>
> >>> For domU guests, when passthrough the device to the guest as per [2], add the below property in the partial device tree
> >>> node that is required to describe the generic device tree binding for IOMMUs and their master(s)
> >>> "iommus = < &magic_phandle 0xvMasterID>
> >>> • magic_phandle will be the phandle ( vIOMMU phandle in xl) that will be documented so that the user can set that in partial DT node (0xfdea).
> >>
> >> Does this mean only one IOMMU will be supported in the guest?
> >
> > Yes.
> >
> >>
> >>> • vMasterID will be the virtual master ID that the user will provide.
> >>> The partial device tree will look like this:
> >>> /dts-v1/;
> >>> / {
> >>> /* #*cells are here to keep DTC happy */
> >>> #address-cells = <2>;
> >>> #size-cells = <2>;
> >>> aliases {
> >>> net = &mac0;
> >>> };
> >>> passthrough {
> >>> compatible = "simple-bus";
> >>> ranges;
> >>> #address-cells = <2>;
> >>> #size-cells = <2>;
> >>> mac0: ethernet@10000000 {
> >>> compatible = "calxeda,hb-xgmac";
> >>> reg = <0 0x10000000 0 0x1000>;
> >>> interrupts = <0 80 4 0 81 4 0 82 4>;
> >>> iommus = <0xfdea 0x01>;
> >>> };
> >>> };
> >>> };
> >>> In xl.cfg we need to define a new option to inform Xen about vMasterId to pMasterId mapping and to which IOMMU device this
> >>> the master device is connected so that Xen can configure the right IOMMU. This is required if the system has devices that have
> >>> the same master ID but behind a different IOMMU.
> >>
> >> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen should already have all the information about the IOMMU and Master-ID. So it doesn't seem necessary for Device-Tree.
> >>
> >> For ACPI, I would have expected the information to be found in the IOREQ.
> >>
> >> So can you add more context why this is necessary for everyone?
> >
> > We have information for IOMMU and Master-ID but we don’t have information for linking vMaster-ID to pMaster-ID.
> > The device tree node will be used to assign the device to the guest and configure the Stage-2 translation. Guest will use the
> > vMaster-ID to configure the vIOMMU during boot. Xen needs information to link vMaster-ID to pMaster-ID to configure
> > the corresponding pIOMMU. As I mention we need vMaster-ID in case a system could have 2 identical Master-ID but
> > each one connected to a different SMMU and assigned to the guest.
>
> I think the proposed solution would work and I would just like to clear some issues.
>
> Please correct me if I'm wrong:
>
> In the xl config file we already need to specify dtdev to point to the device path in host dtb.
> In the partial device tree we specify the vMasterId as well as magic phandle.
> Isn't it that we already have all the information necessary without the need for iommu_devid_map?
> For me it looks like the partial dtb provides vMasterID and dtdev provides pMasterID as well as physical phandle to SMMU.
>
> Having said that, I can also understand that specifying everything in one place using iommu_devid_map can be easier
> and reduces the need for device tree parsing.
>
> Apart from that, what is the reason of exposing only one vSMMU to guest instead of one vSMMU per pSMMU?
> In the latter solution, the whole issue with handling devices with the same stream ID but belonging to different SMMUs
> would be gone. It would also result in a more natural way of the device tree look. Normally a guest would see
> e.g. both SMMUs and exposing only one can be misleading.
>
> I also have the same question. From earlier answers as I understand it is going to be identity vSMMU <-> pSMMU mappings for Dom0, so why diverge for DomU?
>
> Also I am thinking how this solution would work for IPMMU-VMSA Gen3(Gen4), which also supports two stages of translation, so the nested translation could be possible in general, although there might be some pitfalls
> (yes, I understand that code to emulate access to control registers would be different in comparison with SMMUv3, but some other code could be common).
Yes we will try to make code common so that other vIOMMU can be implemented easily.
>
>
>
>
>
> >>
> >>> iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
> >>> • PMASTER_ID is the physical master ID of the device from the physical DT.
> >>> • VMASTER_ID is the virtual master Id that the user will configure in the partial device tree.
> >>> • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device to which this device is connected.
>
>
> If iommu_devid_map is a way to go, I have a question, would this configuration cover the following cases?
> 1. Device has several stream IDs
Yes in that case user needs to create the mapping for each streamIDs. For example if device has streamId 0x10 , 0x20 and 0x30.
iommu_devid_map will be:
iommu_devid_map = ["0x10@0x01,0x40000000”, "0x20@0x02,0x40000000”,"0x30@0x03,0x40000000”]
Here 0x40000000 is physical IOMMU base address.
> 2. Several devices share the stream ID (or several stream IDs)
Let take an example of two devices :
Device 1: 0x10
Device 2: 0x10
Iommu_devid_map = [“0x10@0x1,0x40000000”,"0x10@0x2,0x40000000”]
Xen will create the data structure that include vStreamID, pMasterID and IOMMU base address.
With the help of three tuples we will be able to find the right physical IOMMU.
Thanks for the clarification, I see that iommu_devid_map is able to describe various combinations, which is good. But, the user should be very careful when filling in iommu_devid_map especially if dealing with a system that has many iommus and devices with many stream IDs, as it would be easy to make a mistake in that case. As a real example, if I want to describe 5 DMA controllers assigned to the guest where each has 16 uTLBs (this is an equivalent of stream IDs) I would need to add 80 entries (quite lot) to iommu_devid_map with specifying VMASTER_ID for each entry (as uTLBs are not unique across the system). https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1042https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1084https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1126https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L2450https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L2492So I agree in general with what has been said earlier in that thread to *better* avoid user interaction and teach the toolstack to do this automatically. At the same time I understand this might be quite difficult to implement, etc.
Regards,
Rahul
--
|