[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Mon, 31 Oct 2022 13:26:44 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x0j4VWwaDaMbspt9Gdt93qfRGxWmQkVeaT1k2BNWivQ=; b=EGpJtjA4px86H0VbNlI08cXOHwM432lWo6ir+84WMBt5doD3qOiuUUx8YQvFiyqc1/vejWs+Wj6oATNWB1YVd0DO2oC3orA2eS3s8bEw7wX8VzFh4WUhybj5sXdVqJyj1pVe3CZUSE/1pMooh0mdSCctqHPMZuFYMNjCPZhYoHso6Vr2EVn0uUpcL4wwFgaJGobwzB3hgmHZuh4qoF6eG20+8gVsWIQdTg1niaBa2iAM2mag7Sqej0LhpCyrvul34L9xFpE2hpKhEIRdGegb9Guiu4yu260dDC73fr6bkl5qzdKj3+K//k2C7Y2T9z/CxCMYEu6hPGes4nFpIf9aig==
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x0j4VWwaDaMbspt9Gdt93qfRGxWmQkVeaT1k2BNWivQ=; b=SzVKIEP1nRCVL5+SJApVnmETtBtcgJXcZD7/AFmqDHtLUZOeMfJ0Mdv7hPEYOOdsnUfQswDCQDc0QZdi9VEkYAYOETsv4ylPGfpxCn81ryO76UCQYLAq0JiAtQbG+PCnJUZ0WYnd7YPDM4QdFsn93Zm+4MXASI223tv8dEC8YOHmu8s9LrUL8MDaAKHueJtRZBF7oS/yFePaw2o10J6sbrrsqTY9cBgHHbZDT8YXSLIE98UjshMK3KVpcoUWdAs5BL22RyKBTdAc/NUW3bXPjbeLDrqG8QHKlywD0eok27IyqTnG2+x2B8q8SEcj5kqN0D/R9XNZsgW55btXM9rG7Q==
  • Arc-seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=OEllxYGzZKXwMRCptB7TDF57eAh2SeEiTiMR82lEWJ92uAcLtZxDuWBOLyNzDCe4NYLCxfjpu9RBRi4hV9B6dtZ677bIrsokBgFfct+9THB6rq4id3Lnh75swMhxmGYwYzCM0pfTLQUZOs6i+saMJjm5Bl9gzgjA5/sOJxh1nmE50b4+nNgyvaSpGHSLXSC4M/wa/h9EfyudUNknZ4bZgBTiBNm5SQuYGxUBMDN1beV21jiza0mr2k6PHqYIvlQ7OEZTgb23Gu9LclGnw8TXyFhdiFeCCubZxTApalWIdnKo4Q+f8Co2t2Ydsnntv/2sNe+QWFYfvq1sq3PdB6DB2w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jgMw4fUbmR+USs/26EYUn/tmbPdhXtMadqOTtb+qMOsLeaX7iwvBdSpPTjrP4lmCUBI+R8xzmc7dZ0wI9+WhNgboupWEraFKC7dPNPt9mw6+08u1jQxhciJmBR9fMPYECplFZJMVHLQPAVFphLbbhgUMPCjln0oMWxriGYmHXllU1Kd/6Mt41noqh6sslkY7EQtr0Lqjsg5ZsdORE5TmXffqE/k+xAJ2ApxcgAB+AIqfaGNA71/3715z5uh5yL3CCdN7JWDYrCHLlqDlLTVHf2vk+77yi7Xf94gHSDK57R/4PmcwLhg2xHNWuBJj9LuhFnqU38ot90ERp2WjF55u1A==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Julien Grall <julien@xxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>, Xen developer discussion <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Michal Orzel <Michal.Orzel@xxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
  • Delivery-date: Mon, 31 Oct 2022 13:27:22 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHY6T1V+gcJbmRqpEyt0b8GeG9bFq4grXIAgAAQCACAAFf9AIABVMyAgAAG64CAAVVJgIAAAzuAgAACBwCAAAPNgIADNHqAgABdGACAABWOgIABD7SA
  • Thread-topic: Proposal for virtual IOMMU binding b/w vIOMMU and passthrough devices

Hi All,

> On 30 Oct 2022, at 21:14, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:
> 
> On Sun, 30 Oct 2022, Julien Grall wrote:
>> Hi Stefano,
>> 
>> On 30/10/2022 14:23, Stefano Stabellini wrote:
>>> On Fri, 28 Oct 2022, Julien Grall wrote:
>>>> On 28/10/2022 14:13, Bertrand Marquis wrote:
>>>>>> On 28 Oct 2022, at 14:06, Julien Grall <julien@xxxxxxx> wrote:
>>>>>> 
>>>>>> Hi Rahul,
>>>>>> 
>>>>>> On 28/10/2022 13:54, Rahul Singh wrote:
>>>>>>>>>>>> For ACPI, I would have expected the information to be
>>>>>>>>>>>> found in
>>>>>>>>>>>> the IOREQ.
>>>>>>>>>>>> 
>>>>>>>>>>>> So can you add more context why this is necessary for
>>>>>>>>>>>> everyone?
>>>>>>>>>>> We have information for IOMMU and Master-ID but we don’t
>>>>>>>>>>> have
>>>>>>>>>>> information for linking vMaster-ID to pMaster-ID.
>>>>>>>>>> 
>>>>>>>>>> I am confused. Below, you are making the virtual master ID
>>>>>>>>>> optional. So shouldn't this be mandatory if you really need
>>>>>>>>>> the
>>>>>>>>>> mapping with the virtual ID?
>>>>>>>>> vMasterID is optional if user knows pMasterID is unique on the
>>>>>>>>> system. But if pMasterId is not unique then user needs to
>>>>>>>>> provide
>>>>>>>>> the vMasterID.
>>>>>>>> 
>>>>>>>> So the expectation is the user will be able to know that the
>>>>>>>> pMasterID
>>>>>>>> is uniq. This may be easy with a couple of SMMUs, but if you have
>>>>>>>> 50+
>>>>>>>> (as suggested above). This will become a pain on larger system.
>>>>>>>> 
>>>>>>>> IHMO, it would be much better if we can detect that in libxl (see
>>>>>>>> below).
>>>>>>> We can make the vMasterID compulsory to avoid complexity in libxl to
>>>>>>> solve this
>>>>>> 
>>>>>> In general, complexity in libxl is not too much of problem.
>>> 
>>> I agree with this and also I agree with Julien's other statement:
>>> 
>>> "I am strongly in favor of libxl to modify it if it greatly improves the
>>> user experience."
>>> 
>>> I am always in favor of reducing complexity for the user as they
>>> typically can't deal with tricky details such as MasterIDs. In general,
>>> I think we need more automation with our tooling.
>>> 
>>> However, it might not be as simple as adding support for automatically
>>> generating IDs in libxl because we have 2 additional cases to support:
>>> 1) dom0less
>>> 2) statically built guests
>>> 
>>> For 1) we would need the same support also in Xen? Which means more
>>> complexity in Xen.
>> Xen will need to parse the device-tree to find the mapping. So I am not
>> entirely convinced there will be more complexity needed other than requiring 
>> a
>> bitmap to know which vMasterID has been allocated.
>> 
>> That said, you would still need one to validate the input provided by the
>> user. So overall maybe there will be no added complexity?
>> 
>>> 
>>> 2) are guests like Zephyr that consume a device tree at
>>> build time instead of runtime. These guests are built specifically for a
>>> given environment and it is not a problem to rebuild them for every Xen
>>> release.
>>> 
>>> However I think it is going to be a problem if we have to run libxl to
>>> get the device tree needed for the Zephyr build. That is because it
>>> means that the Zephyr build system would have to learn how to compile
>>> (or crosscompile) libxl in order to retrieve the data needed for its
>>> input. Even for systems based on Yocto (Yocto already knows how to build
>>> libxl) would cause issues because of internal dependencies this would
>>> introduce.
>> 
>> That would not be very different to how this works today for Zephyr. They 
>> need
>> libxl to generate the guest DT.
>> 
>> That said, I agree this is a bit of a pain...
> 
> Yeah..
> 
> 
>>> So I think the automatic generation might be best done in another tool.
>> It sounds like what you want is creating something similar to libacpi but for
>> Device-Tree. That should work with some caveats.
> 
> Yes, something like that. We have a framework for reading, editing and
> generating Device Tree: Lopper https://github.com/devicetree-org/lopper
> 
> It is mostly targeted at build time but it could also be invoked on
> target at runtime.
> 
> 
>>> I think we need something like a script that takes a partial device tree
>>> as input and provides a more detailed partial device tree as output with
>>> the generated IDs.
>> 
>> AFAICT, having the partial device-tree is not enough. You also need the real
>> DT to figure out the pMaster-ID.
>> 
>>> 
>>> If we did it that way, we could call the script from libxl, but also we
>>> could call it separately from ImageBuilder for dom0less and Zephyr/Yocto
>>> could also call it.
>>> 
>>> Basically we make it easier for everyone to use it. The only price to
>>> pay is that it will be a bit less efficient for xl guests (one more
>>> script to fork and exec) but I think is a good compromise.
>> 
>> We would need an hypercall to retrieve the host Device-Tree. But that would
>> not be too difficult to add.
> 
> Good point
> 
> 
>>> I think this is a great idea, I only suggest that we move the automatic
>>> generation out of libxl (a separate stand-alone script), in another
>>> place that can be more easily reused by multiple projects and different
>>> use-cases.
>> 
>> If we use the concept of libacpi, we may not need a to have a stand-alone
>> script. It could directly linked in libxl or any other tools.
> 
> I don't feel strongly whether it should be a library, a script or
> something else. My only point is that it should be easy to use both at
> build time (e.g. Yocto/Zephyr/ImageBuilder/Lopper) and runtime
> (xl/libxl).
> 
> We have already a partial DTB generator as a Lopper "lop" (a Lopper
> plugin). Probably using Lopper would be the easiest way to implement it,
> and the "lop" could be under xen.git (it doesn't have to reside under
> the lopper repository).
> 
> But if we wanted a library that would be OK too. The issue with libxl is
> not much that it is a library but that it is complex to build and has
> many dependencies (it can only be built from the top level ./configure
> and make).
> 
> Ideally this would be something quick that can be easily invoked as the
> first step of an external third-party build process.

I think that we are making this problem a lot to complex and I am not sure
that all this complexity is required.

For now, we could make the assumption that a master ID is uniq and never
reused on a system. Linux is currently making this assumption to simplify
the code. We also found no hardware with the same master ID reused.

It would mean that the user would just need to keep the stream-id property
in the device tree, replace the link to the SMMU with a fake phandle. The
tools could then add the vIOMMU node and fix all phandle in the device tree
to properly point to it. In practice the user can simply copy the whole device
node with the stream-id properties and just replace the phandle by 0x0.

This will make the first implementation a lot simpler and prevent adding
hyper calls or to much magic in the tools for now.
This will also give us more time to check if we need more complex use
cases and how they could be configured.

What do you think ?

Cheers
Bertrand



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.