[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices


  • To: Michal Orzel <michal.orzel@xxxxxxx>
  • From: Rahul Singh <Rahul.Singh@xxxxxxx>
  • Date: Thu, 27 Oct 2022 16:12:21 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sQXFUjz6pUP2jOn1u7TPpY2+qa2Qni+KHtughFLBD+g=; b=WJ96zLdc2i/krZo77eBZH3c8UYvbPf2ppef0gsm+ew0cWyMmN7so35moKkW31/bqMit/vgHugc3vrDs5P/qxGU4m5Wv2Y/3vTc/fwMxio0rRlx2XQZZW8PvotAvlzSuoNIADgwHg6eRV0cDyeTDtCrtdlOel1J0vUSlrX2n2LfcaYpaAva+1ppIihFLSybRY2CXGMOa2+45sC6bu5hPfLulUWbUzTengdye8jyouXfQXhu5mbnrLTrDL0GYP6Fl2Xrfe1adTnoE2PovjvyeQ10dqMwSWRIa6jvBextJaKHNaQDuLzQqqKqdg3r95/bmZJfNeFIcbfpWmhnb/OXTWsQ==
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sQXFUjz6pUP2jOn1u7TPpY2+qa2Qni+KHtughFLBD+g=; b=Nd9/H390Y0gvKv8SGf2n/+DMVoZMoqpUEmDfzg6EomxNH/0DM7F3Ua9TXC2SUvos5l5HiT3ei+OedIyg4o7qNUB2BTdEov7zDgY4cZrHZlhZbT6h1GvG6UuTD4w7NqHAFuCJyeSMNGgQ2QLWMd37R7Ga+B6B13CndT9O+Rf1+rZXOUFiduS1/YdYFtDWEQg0fSG/2x7ndvq8vL73I6JpNe0tUS3Lz3Csc33EJJeqz8x6zk9FYZvu73KwAKc053EIGTx2MotmTjqzMzUS1I99pXeaNPLEV0+uZlP+W4QQvpZzjJCmN7pt6ANvnBxXwROzA4wacSc4yk6y5YalSln93w==
  • Arc-seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=nqPSWDR0+p/K7KLqCMFlNtChevDgD6VJHwihZVHGYkH1LHwDpNDoIMq2LVo/k7IdLpiztr6MfI1d4L/CtRzQFfGWjzqQu1UJEQGj2pqPftXeLizHoshRjJfzKT+FgLLMsD7sGXPs0d8t0sxl/HkWdALafuij6d+DWHZl9eDj2Yg+c4+71Y+hWWAHYDBqy3vZry19aZh9ZA1FVuFAEzkIikFvqmbLi6BQy9/JSIYBwrt9tsZamekMg0ynfLq0nngnclKiWEeb3V0+Y64N6IJIAAuQNiMqU+U3Yt3d1sMP/ZvhFmoCpKtx4drzBKwH1YlXNNH+UG+wABNMBikV3ueTQw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J6wuwRuSFvVJz3hn9xP0mX7WvFVkTlpi16N561EQemczYIz/T4lmAnL+Qi2VlcP6cmof3dArZirsB9xT7w2WBQBKJDw51hb3913JSDiFIbA7Sl2q+60Fwxgu+H5rp6+QYP734y735Z33Dh1zAxC5xVAebHVKal0B/lb6QosfVkjvZM6rKmZdujx8wrfPji7HxvHL5M/o59tx/EcA1NSTXqYIAaUFyIPHtIWdcbYpU4dugFujWvvVu3bAsNsG99pzQZQXphsDg/7k7tA6R7lndIhBJzDYNB4Yn3XVBwoCchpr1iOBNDtQ0nhWOrLDyGi2QKQI6K4H1eEH46QyCiWJTw==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Julien Grall <julien@xxxxxxx>, Xen developer discussion <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <Michal.Orzel@xxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
  • Delivery-date: Thu, 27 Oct 2022 16:12:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHY6T1VjOHHQJGN7kyXeL6pAIKNMa4grXIAgAAQBQCAAC3DgIABgBSA
  • Thread-topic: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices

Hi Michal,

> On 26 Oct 2022, at 6:17 pm, Michal Orzel <michal.orzel@xxxxxxx> wrote:
> 
> Hi Rahul,
> 
> On 26/10/2022 16:33, Rahul Singh wrote:
>> 
>> 
>> Hi Julien,
>> 
>>> On 26 Oct 2022, at 2:36 pm, Julien Grall <julien@xxxxxxx> wrote:
>>> 
>>> 
>>> 
>>> On 26/10/2022 14:17, Rahul Singh wrote:
>>>> Hi All,
>>> 
>>> Hi Rahul,
>>> 
>>>> At Arm, we started to implement the POC to support 2 levels of page 
>>>> tables/nested translation in SMMUv3.
>>>> To support nested translation for guest OS Xen needs to expose the virtual 
>>>> IOMMU. If we passthrough the
>>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled 
>>>> for the guest there is a need to
>>>> add IOMMU binding for the device in the passthrough node as per [1]. This 
>>>> email is to get an agreement on
>>>> how to add the IOMMU binding for guest OS.
>>>> Before I will explain how to add the IOMMU binding let me give a brief 
>>>> overview of how we will add support for virtual
>>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested 
>>>> translation support. SMMUv3 hardware
>>>> supports two stages of translation. Each stage of translation can be 
>>>> independently enabled. An incoming address is logically
>>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 
>>>> which translates the IPA to the output PA. Stage 1 is
>>>> intended to be used by a software entity( Guest OS) to provide isolation 
>>>> or translation to buffers within the entity, for example,
>>>> DMA isolation within an OS. Stage 2 is intended to be available in systems 
>>>> supporting the Virtualization Extensions and is
>>>> intended to virtualize device DMA to guest VM address spaces. When both 
>>>> stage 1 and stage 2 are enabled, the translation
>>>> configuration is called nesting.
>>>> Stage 1 translation support is required to provide isolation between 
>>>> different devices within the guest OS. XEN already supports
>>>> Stage 2 translation but there is no support for Stage 1 translation for 
>>>> guests. We will add support for guests to configure
>>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU 
>>>> hardware and exposes the virtual SMMU to the guest.
>>>> Guest can use the native SMMU driver to configure the stage 1 translation. 
>>>> When the guest configures the SMMU for Stage 1,
>>>> XEN will trap the access and configure the hardware accordingly.
>>>> Now back to the question of how we can add the IOMMU binding between the 
>>>> virtual IOMMU and the master devices so that
>>>> guests can configure the IOMMU correctly. The solution that I am 
>>>> suggesting is as below:
>>>> For dom0, while handling the DT node(handle_node()) Xen will replace the 
>>>> phandle in the "iommus" property with the virtual
>>>> IOMMU node phandle.
>>> Below, you said that each IOMMUs may have a different ID space. So 
>>> shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the 
>>> user to specify the mapping?
>> 
>> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This 
>> also helps in the ACPI case
>> where we don’t need to modify the tables to delete the pIOMMU entries and 
>> create one vIOMMU.
>> In this case, no need to replace the phandle as Xen create the vIOMMU with 
>> the same pIOMMU
>> phandle and same base address.
>> 
>> For domU guests one vIOMMU per guest will be created.
>> 
>>> 
>>>> For domU guests, when passthrough the device to the guest as per [2],  add 
>>>> the below property in the partial device tree
>>>> node that is required to describe the generic device tree binding for 
>>>> IOMMUs and their master(s)
>>>> "iommus = < &magic_phandle 0xvMasterID>
>>>>     • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that will 
>>>> be documented so that the user can set that in partial DT node (0xfdea).
>>> 
>>> Does this mean only one IOMMU will be supported in the guest?
>> 
>> Yes.
>> 
>>> 
>>>>     • vMasterID will be the virtual master ID that the user will provide.
>>>> The partial device tree will look like this:
>>>> /dts-v1/;
>>>> / {
>>>>    /* #*cells are here to keep DTC happy */
>>>>    #address-cells = <2>;
>>>>    #size-cells = <2>;
>>>>      aliases {
>>>>        net = &mac0;
>>>>    };
>>>>      passthrough {
>>>>        compatible = "simple-bus";
>>>>        ranges;
>>>>        #address-cells = <2>;
>>>>        #size-cells = <2>;
>>>>        mac0: ethernet@10000000 {
>>>>            compatible = "calxeda,hb-xgmac";
>>>>            reg = <0 0x10000000 0 0x1000>;
>>>>            interrupts = <0 80 4  0 81 4  0 82 4>;
>>>>           iommus = <0xfdea 0x01>;
>>>>        };
>>>>    };
>>>> };
>>>> In xl.cfg we need to define a new option to inform Xen about vMasterId to 
>>>> pMasterId mapping and to which IOMMU device this
>>>> the master device is connected so that Xen can configure the right IOMMU. 
>>>> This is required if the system has devices that have
>>>> the same master ID but behind a different IOMMU.
>>> 
>>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen 
>>> should already have all the information about the IOMMU and Master-ID. So 
>>> it doesn't seem necessary for Device-Tree.
>>> 
>>> For ACPI, I would have expected the information to be found in the IOREQ.
>>> 
>>> So can you add more context why this is necessary for everyone?
>> 
>> We have information for IOMMU and Master-ID but we don’t have information 
>> for linking vMaster-ID to pMaster-ID.
>> The device tree node will be used to assign the device to the guest and 
>> configure the Stage-2 translation. Guest will use the
>> vMaster-ID to configure the vIOMMU during boot. Xen needs information to 
>> link vMaster-ID to pMaster-ID to configure
>> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system 
>> could have 2 identical Master-ID but
>> each one connected to a different SMMU and assigned to the guest.
> 
> I think the proposed solution would work and I would just like to clear some 
> issues.
> 
> Please correct me if I'm wrong:
> 
> In the xl config file we already need to specify dtdev to point to the device 
> path in host dtb.
> In the partial device tree we specify the vMasterId as well as magic phandle.
> Isn't it that we already have all the information necessary without the need 
> for iommu_devid_map?
> For me it looks like the partial dtb provides vMasterID and dtdev provides 
> pMasterID as well as physical phandle to SMMU.
> 
> Having said that, I can also understand that specifying everything in one 
> place using iommu_devid_map can be easier
> and reduces the need for device tree parsing.
> 
> Apart from that, what is the reason of exposing only one vSMMU to guest 
> instead of one vSMMU per pSMMU?
> In the latter solution, the whole issue with handling devices with the same 
> stream ID but belonging to different SMMUs
> would be gone. It would also result in a more natural way of the device tree 
> look. Normally a guest would see
> e.g. both SMMUs and exposing only one can be misleading.

Please see the other email that I replied to Julien to know the answer to the 
above question.

Regards,
Rahul

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.