[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices


  • To: Julien Grall <julien@xxxxxxx>
  • From: Rahul Singh <Rahul.Singh@xxxxxxx>
  • Date: Wed, 26 Oct 2022 14:33:56 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yyC1lwX16FyUJf7K1KSdYHzBhBszsmJLjcTx4sBobV8=; b=H0dRx3HQTuL1WrGUnhWBQpCffbTSV0nsX2Sh5ozX5QOSFiyS8RON/OT1pk5bEV1Wec4tgGp2yAmfwKClx1EFcWI9MYmR9b0IFmN1QZydZxkQEs5SZ6rB4ppqGKHbtyS0gwS2A5cVfffjo0qgx+ykmeBaentD4MAp2lBCnO1ahbRXGIsf9/kyQwG3U/zUZI0LGWRhHKs3z45KRSrnWbtKEiPTnJLs9J+MKaxBpge7s2ecWFyOZqhYZgZWa/owv3lLI8oxM1pTHxFtZ3deEgJ+SKQoW99X3nW726O+3aSYJAPnfWNknqZb5p11WzLNbUg9JwgYk75t3TX7yf9My2FGOQ==
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yyC1lwX16FyUJf7K1KSdYHzBhBszsmJLjcTx4sBobV8=; b=AV07Wq9fYyIXxji8R9DGmojAzHckT2H+zGtZRVmh/v/rJeqKYynZPcWumfi+epB5Mp/L4HlxBXmyoS/VCUDDVV6bam9gOJTys8fU8a11vcJY/GF6gYre0c9oRzu+LGmKV9xF+LjeFj1dCAonzWt8Rkgxm6qkWCmgNVA3QUyQDJ+yd+zSeApH2z1F+9ZKaaYDXsnjhj72e5RHX92iJdlef6Jca3l6U2pWLgkYR3oMOCPjR0WoXgFDYIZuTmR3U0iYonFo+bPWhfhAK6awsRjAsgVKMxlt2b2sE6AfdHSMFfd6VnFpnUdPDd7vvIycfE71YdVG5HtrD8grhhZqsDkXgw==
  • Arc-seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=hQ4FwAMKgdwZIE2jL+Zjk+8MnoUStrB0C4wdwYt0au8Wl3U0b6rX05XotzdHQyU7zGT3G2UajdBJ/gv0qOIaBnAFmekj/9thWDgweqxr64WRDzEKHihEwpaKwBVq477wLu2RPUDytVfQk/1rlC7tZd+8w0D7aCmV7N2ZYgFFwPTbeKGBMDVwthbVtjwvj0jAi1zW94SDs5FRZzJw/12IRlqYnVzimpG0wjbj6hcOF4Sp0epaYa5X/KDn5L5URXdDWv5BGVL6ovQ4QPX8xu8JzU5Qyblv6C6KH+asRuaqqmrs4C7HXePSz6rigYYqC3o1Ei6mL/K4ETsaBsVBEkJ4gA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=P0BAW+6+VSFoHm5tgNk2nAJ5kY8NccJfFhP9cI92v8K21q2lj9Sj9wr/VhshfP7KjKOda6yltofVjLtYLizE+rFJQ+zuylI021MuAHA1yHId44BnyUX1wif9vNrlQl1tZja6EJppSzl/9P7E44ssXw8CFM3vkGN0I82dvln74wBQW1y0IO7ffwp/FNzAj7MP62EZjQtvkgZwl47pxrmKUsyRm21U/0baQpR9oXYashJSozGEyMFdEZto9utw23lvDvRnyXlnSqYD/3tEtujJv2OmCHJAONrzr19sfM94yzoIBa7gI6H1j46eZXU24BOj2uAf/nX9dtsNBIPfyZ68bQ==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Xen developer discussion <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <Michal.Orzel@xxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
  • Delivery-date: Wed, 26 Oct 2022 14:34:28 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHY6T1VjOHHQJGN7kyXeL6pAIKNMa4grXIAgAAQBQA=
  • Thread-topic: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices

Hi Julien,

> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xxxxxxx> wrote:
> 
> 
> 
> On 26/10/2022 14:17, Rahul Singh wrote:
>> Hi All,
> 
> Hi Rahul,
> 
>> At Arm, we started to implement the POC to support 2 levels of page 
>> tables/nested translation in SMMUv3.
>> To support nested translation for guest OS Xen needs to expose the virtual 
>> IOMMU. If we passthrough the
>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled for 
>> the guest there is a need to
>> add IOMMU binding for the device in the passthrough node as per [1]. This 
>> email is to get an agreement on
>> how to add the IOMMU binding for guest OS.
>> Before I will explain how to add the IOMMU binding let me give a brief 
>> overview of how we will add support for virtual
>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested 
>> translation support. SMMUv3 hardware
>> supports two stages of translation. Each stage of translation can be 
>> independently enabled. An incoming address is logically
>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 which 
>> translates the IPA to the output PA. Stage 1 is
>> intended to be used by a software entity( Guest OS) to provide isolation or 
>> translation to buffers within the entity, for example,
>> DMA isolation within an OS. Stage 2 is intended to be available in systems 
>> supporting the Virtualization Extensions and is
>> intended to virtualize device DMA to guest VM address spaces. When both 
>> stage 1 and stage 2 are enabled, the translation
>> configuration is called nesting.
>> Stage 1 translation support is required to provide isolation between 
>> different devices within the guest OS. XEN already supports
>> Stage 2 translation but there is no support for Stage 1 translation for 
>> guests. We will add support for guests to configure
>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU hardware 
>> and exposes the virtual SMMU to the guest.
>> Guest can use the native SMMU driver to configure the stage 1 translation. 
>> When the guest configures the SMMU for Stage 1,
>> XEN will trap the access and configure the hardware accordingly.
>> Now back to the question of how we can add the IOMMU binding between the 
>> virtual IOMMU and the master devices so that
>> guests can configure the IOMMU correctly. The solution that I am suggesting 
>> is as below:
>> For dom0, while handling the DT node(handle_node()) Xen will replace the 
>> phandle in the "iommus" property with the virtual
>> IOMMU node phandle.
> Below, you said that each IOMMUs may have a different ID space. So shouldn't 
> we expose one vIOMMU per pIOMMU? If not, how do you expect the user to 
> specify the mapping?

Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This also 
helps in the ACPI case
where we don’t need to modify the tables to delete the pIOMMU entries and 
create one vIOMMU.
In this case, no need to replace the phandle as Xen create the vIOMMU with the 
same pIOMMU
phandle and same base address.

For domU guests one vIOMMU per guest will be created.

> 
>> For domU guests, when passthrough the device to the guest as per [2],  add 
>> the below property in the partial device tree
>> node that is required to describe the generic device tree binding for IOMMUs 
>> and their master(s)
>> "iommus = < &magic_phandle 0xvMasterID>
>>      • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will 
>> be documented so that the user can set that in partial DT node (0xfdea).
> 
> Does this mean only one IOMMU will be supported in the guest?

Yes.

> 
>>      • vMasterID will be the virtual master ID that the user will provide.
>> The partial device tree will look like this:
>> /dts-v1/;
>>  / {
>>     /* #*cells are here to keep DTC happy */
>>     #address-cells = <2>;
>>     #size-cells = <2>;
>>       aliases {
>>         net = &mac0;
>>     };
>>       passthrough {
>>         compatible = "simple-bus";
>>         ranges;
>>         #address-cells = <2>;
>>         #size-cells = <2>;
>>         mac0: ethernet@10000000 {
>>             compatible = "calxeda,hb-xgmac";
>>             reg = <0 0x10000000 0 0x1000>;
>>             interrupts = <0 80 4  0 81 4  0 82 4>;
>>            iommus = <0xfdea 0x01>;
>>         };
>>     };
>> };
>>  In xl.cfg we need to define a new option to inform Xen about vMasterId to 
>> pMasterId mapping and to which IOMMU device this
>> the master device is connected so that Xen can configure the right IOMMU. 
>> This is required if the system has devices that have
>> the same master ID but behind a different IOMMU.
> 
> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen 
> should already have all the information about the IOMMU and Master-ID. So it 
> doesn't seem necessary for Device-Tree.
> 
> For ACPI, I would have expected the information to be found in the IOREQ.
> 
> So can you add more context why this is necessary for everyone?

We have information for IOMMU and Master-ID but we don’t have information for 
linking vMaster-ID to pMaster-ID.
The device tree node will be used to assign the device to the guest and 
configure the Stage-2 translation. Guest will use the
vMaster-ID to configure the vIOMMU during boot. Xen needs information to link 
vMaster-ID to pMaster-ID to configure
the corresponding pIOMMU. As I mention we need vMaster-ID in case a system 
could have 2 identical Master-ID but
each one connected to a different SMMU and assigned to the guest.

> 
>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , 
>> “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
>>      • PMASTER_ID is the physical master ID of the device from the physical 
>> DT.
>>      • VMASTER_ID is the virtual master Id that the user will configure in 
>> the partial device tree.
>>      • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU device 
>> to which this device is connected.
> 
> Below you give an example for Platform device. How would that fit in the 
> context of PCI passthrough?

In PCI passthrough case, xl will create the "iommu-map" property in vpci host 
bridge node with phandle to vIOMMU node.
vSMMUv3 node will be created in xl.

> 
>>  Example: Let's say the user wants to assign the below physical device in DT 
>> to the guest.
>>  iommu@4f000000 {
>>                 compatible = "arm,smmu-v3";
>>                      interrupts = <0x00 0xe4 0xf04>;
>>                 interrupt-parent = <0x01>;
>>                 #iommu-cells = <0x01>;
>>                 interrupt-names = "combined";
>>                 reg = <0x00 0x4f000000 0x00 0x40000>;
>>                 phandle = <0xfdeb>;
>>                 name = "iommu";
>> };
> 
> So I guess this node will be written by Xen. How will you the case where 
> there are extra property to added (e.g. dma-coherent)?

In this example this is physical IOMMU node. vIOMMU node wil be created by xl 
during guest creation.
> 
>>  test@10000000 {
>>      compatible = "viommu-test”;
>>      iommus = <0xfdeb 0x10>;
> 
> I am a bit confused. Here you use 0xfdeb for the phandle but below...

Here 0xfdeb is the physical IOMMU node phandle...
> 
>>      interrupts = <0x00 0xff 0x04>;
>>      reg = <0x00 0x10000000 0x00 0x1000>;
>>      name = "viommu-test";
>> };
>>  The partial Device tree node will be like this:
>>  / {
>>     /* #*cells are here to keep DTC happy */
>>     #address-cells = <2>;
>>     #size-cells = <2>;
>>       passthrough {
>>         compatible = "simple-bus";
>>         ranges;
>>         #address-cells = <2>;
>>         #size-cells = <2>;
>>      test@10000000 {
>>              compatible = "viommu-test";
>>              reg = <0 0x10000000 0 0x1000>;
>>              interrupts = <0 80 4  0 81 4  0 82 4>;
>>              iommus = <0xfdea 0x01>;
> 
> ... you use 0xfdea. Does this mean 'xl' will rewrite the phandle?

but here user has to set the “iommus” property with magic phanle as explained 
earlier. 0xfdea is magic phandle. 
 
Regards,
Rahul

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.