[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: PCI devices passthrough on Arm design proposal

  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Sat, 18 Jul 2020 09:49:43 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ayHNGyMcSChVd48UGuW8lKhQv9KuZrlATPwlK5jI0eY=; b=jyeNbgcRf2wYrjJxxR2lxegP4I2MVUgT0NoyUnk+Ch4lJojqRRgNVFcXaaoZKj2WfpbaFDpd1mVoswz2QjzZFQ2eImcwm+T+MNMcS02OwsBzdmJJ9jp6p9PeJ/vGE+4kWCV0ncuYtjb2igekIWSJP2Ee7kcPbI/dBH3KE13M6iZcD0UhRO71IKlGn0l1DkNOtBFxXjY9siWF/en2qKgYNGk2rgS1234m7cPKcF0hiLr9gu8bjLeXu8C7t3JwhWOyty5VH9gzcHk0ftCcKg9Ngqww2BnTlnEgyDx/mTtSxtFOEmgdp9xTw3hg5X1MtlE3PgAi+4rjuiqxPk8sWUWKEw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=B6IFExYEOBaoechXXWMV4TnL69XI1lDjSAhtQs7S5pwz4tGafgwPa6vFq9fIwNhuPfQunlWURVZfI+zcoB2UEayEC0x6bNpxvoakTOH8XVL2+VBmebN/ZOb21m9eaSeGXHo5Rte9HdJLpsAqwp+rtdu8Yrd18U1N3FS3Bi0N0uRI7+rHKYCTtFiI8FALZl8vV9wFrp6LOskrmExtGIZD+XWdD4MLSvsY21B2iXHAStiI5hPy6j8kIWyqFY+MNPIga++Ll+4GsN/661VDlonccT4zfbHFZm4EyozwKYfWipDBSuRG62MZWbu7DHw3liBgJOszVpbY0sztdkjvrQ3Uqw==
  • Authentication-results-original: citrix.com; dkim=none (message not signed) header.d=none;citrix.com; dmarc=none action=none header.from=arm.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, nd <nd@xxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien.grall.oss@xxxxxxxxx>
  • Delivery-date: Sat, 18 Jul 2020 09:50:20 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: citrix.com; dkim=none (message not signed) header.d=none;citrix.com; dmarc=none action=none header.from=arm.com;
  • Thread-topic: RFC: PCI devices passthrough on Arm design proposal

> On 17 Jul 2020, at 17:55, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> On Fri, Jul 17, 2020 at 03:21:57PM +0000, Bertrand Marquis wrote:
>>> On 17 Jul 2020, at 16:31, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>>> On Fri, Jul 17, 2020 at 01:22:19PM +0000, Bertrand Marquis wrote:
>>>>> On 17 Jul 2020, at 13:16, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>>>>>> * ACS capability is disable for ARM as of now as after enabling it
>>>>>> devices are not accessible.
>>>>>> * Dom0Less implementation will require to have the capacity inside Xen
>>>>>> to discover the PCI devices (without depending on Dom0 to declare them
>>>>>> to Xen).
>>>>> I assume the firmware will properly initialize the host bridge and
>>>>> configure the resources for each device, so that Xen just has to walk
>>>>> the PCI space and find the devices.
>>>>> TBH that would be my preferred method, because then you can get rid of
>>>>> the hypercall.
>>>>> Is there anyway for Xen to know whether the host bridge is properly
>>>>> setup and thus the PCI bus can be scanned?
>>>>> That way Arm could do something similar to x86, where Xen will scan
>>>>> the bus and discover devices, but you could still provide the
>>>>> hypercall in case the bus cannot be scanned by Xen (because it hasn't
>>>>> been setup).
>>>> That is definitely the idea to rely by default on a firmware doing this 
>>>> properly.
>>>> I am not sure wether a proper enumeration could be detected properly in all
>>>> cases so it would make sens to rely on Dom0 enumeration when a Xen
>>>> command line argument is passed as explained in one of Rahul’s mails.
>>> I assume Linux somehow knows when it needs to initialize the PCI root
>>> complex before attempting to access the bus. Would it be possible to
>>> add this logic to Xen so it can figure out on it's own whether it's
>>> safe to scan the PCI bus or whether it needs to wait for the hardware
>>> domain to report the devices present?
>> That might be possible to do but will anyway require a command line argument
>> to be able to force xen to let the hardware domain do the initialization 
>> anyway in
>> case Xen detection does not work properly.
>> In the case where there is a Dom0 i would more expect that we let it do the 
>> initialization
>> all the time unless the user is telling using a command line argument that 
>> the current one
>> is correct and shall be used.
> FRT, on x86 we let dom0 enumerate and probe the PCI devices as it
> feels like, but vPCI traps have already been set to all the detected
> devices, and vPCI already supports letting dom0 size the BARs, or even
> change it's position (theoretically, I haven't seen a dom0 change the
> position of the BARs yet).
> So on Arm you could also let dom0 do all of this, the question is
> whether vPCI traps could be set earlier (when dom0 is created) if the
> PCI bus has been initialized and can be scanned.
> I have no idea however how bare metal Linux on Arm figures out the
> state of the PCI bus, or if it's something that's passed on the DT, or
> signaled somehow from the firmware/bootloader.

This is definitely something we will check and we will also try to keep the same
behaviour as x86 unless this is not possible. I would not see why we could not 
set the vPCI traps earlier and just relay the writes to the hardware but detect
if BARs are changed.

>>>>> This should be limited to read-only accesses in order to be safe.
>>>>> Emulating a PCI bridge in Xen using vPCI shouldn't be that
>>>>> complicated, so you could likely replace the real bridges with
>>>>> emulated ones. Or even provide a fake topology to the guest using an
>>>>> emulated bridge.
>>>> Just showing all bridges and keeping the hardware topology is the simplest
>>>> solution for now. But maybe showing a different topology and only fake
>>>> bridges could make sense and be implemented in the future.
>>> Ack. I've also heard rumors of Xen on Arm people being very interested
>>> in VirtIO support, in which case you might expose both fully emulated
>>> VirtIO devices and PCI passthrough devices on the PCI bus, so it would
>>> be good to spend some time thinking how those will fit together.
>>> Will you allocate a separate segment unused by hardware to expose the
>>> fully emulated PCI devices (VirtIO)?
>>> Will OSes support having several segments?
>>> If not you likely need to have emulated bridges so that you can adjust
>>> the bridge window accordingly to fit the passthrough and the emulated
>>> MMIO space, and likely be able to expose passthrough devices using a
>>> different topology than the host one.
>> Honestly this is not something we considered. I was more thinking that
>> this use case would be handled by creating an other VPCI bus dedicated
>> to those kind of devices instead of mixing physical and virtual devices.
> Just mentioning it and your plans when guests might also have fully
> emulated devices on the PCI bus would be relevant I think.

We will add this.

> Anyway, I don't think it's something mandatory here, as from a guest
> PoV how we expose PCI devices shouldn't matter that much, as long as
> it's done in a spec compliant way.
> So you can start with this approach if it's easier, I just wanted to
> make sure you have in mind that at some point Arm guests might also
> require fully emulated PCI devices so that you don't paint yourselves
> in a corner.

Definitely that’s not something we did think of and thanks for the remark
as we need to keep this in mind.

>>>>>> # Emulated PCI device tree node in libxl:
>>>>>> Libxl is creating a virtual PCI device tree node in the device tree
>>>>>> to enable the guest OS to discover the virtual PCI during guest
>>>>>> boot. We introduced the new config option [vpci="pci_ecam"] for
>>>>>> guests. When this config option is enabled in a guest configuration,
>>>>>> a PCI device tree node will be created in the guest device tree.
>>>>>> A new area has been reserved in the arm guest physical map at which
>>>>>> the VPCI bus is declared in the device tree (reg and ranges
>>>>>> parameters of the node). A trap handler for the PCI ECAM access from
>>>>>> guest has been registered at the defined address and redirects
>>>>>> requests to the VPCI driver in Xen.
>>>>> Can't you deduce the requirement of such DT node based on the presence
>>>>> of a 'pci=' option in the same config file?
>>>>> Also I wouldn't discard that in the future you might want to use
>>>>> different emulators for different devices, so it might be helpful to
>>>>> introduce something like:
>>>>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', 
>>>>> '0a:00.0,backend=qemu', ... ]
>>>>> For the time being Arm will require backend=vpci for all the passed
>>>>> through devices, but I wouldn't rule out this changing in the future.
>>>> We need it for the case where no device is declared in the config file and 
>>>> the user
>>>> wants to add devices using xl later. In this case we must have the DT node 
>>>> for it
>>>> to work. 
>>> There's a passthrough xl.cfg option for that already, so that if you
>>> don't want to add any PCI passthrough devices at creation time but
>>> rather hotplug them you can set:
>>> passthrough=enabled
>>> And it should setup the domain to be prepared to support hot
>>> passthrough, including the IOMMU [0].
>> Isn’t this option covering more then PCI passthrough ?
>> Lots of Arm platform do not have a PCI bus at all, so for those
>> creating a VPCI bus would be pointless. But you might need to
>> activate this to pass devices which are not on the PCI bus.
> Well, you can check whether the host has PCI support and decide
> whether to attach a virtual PCI bus to the guest or not?
> Setting passthrough=enabled should prepare the guest to handle
> passthrough, in whatever form is supported by the host IMO.

True, we could just say that we create a PCI bus if the host has one and
passthrough is activated.
But with virtual device point, we might even need one on guest without
PCI support on the hardware :-)

>>>>>> Limitation:
>>>>>> * Need to avoid the “iomem” and “irq” guest config
>>>>>> options and map the IOMEM region and IRQ at the same time when
>>>>>> device is assigned to the guest using the “pci” guest config options
>>>>>> when xl creates the domain.
>>>>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped
>>>>>> address.
>>>>> It was my understanding that you would identity map the BAR into the
>>>>> domU stage-2 translation, and that changes by the guest won't be
>>>>> allowed.
>>>> In fact this is not possible to do and we have to remap at a different 
>>>> address
>>>> because the guest physical mapping is fixed by Xen on Arm so we must follow
>>>> the same design otherwise this would only work if the BARs are pointing to 
>>>> an
>>>> address unused and on Juno this is for example conflicting with the guest
>>>> RAM address.
>>> This was not clear from my reading of the document, could you please
>>> clarify on the next version that the guest physical memory map is
>>> always the same, and that BARs from PCI devices cannot be identity
>>> mapped to the stage-2 translation and instead are relocated somewhere
>>> else?
>> We will.
>>> I'm then confused about what you do with bridge windows, do you also
>>> trap and adjust them to report a different IOMEM region?
>> Yes this is what we will have to do so that the regions reflect the VPCI 
>> mappings
>> and not the hardware one.
>>> Above you mentioned that read-only access was given to bridge
>>> registers, but I guess some are also emulated in order to report
>>> matching IOMEM regions?
>> yes that’s exact. We will clear this in the next version.
> If you have to go this route for domUs, it might be easier to just
> fake a PCI host bridge and place all the devices there even with
> different SBDF addresses. Having to replicate all the bridges on the
> physical PCI bus and fixing up it's MMIO windows seems much more
> complicated than just faking/emulating a single bridge?

That’s definitely something we have to dig more on. The whole problematic
of PCI enumeration and BAR value assignation in Xen might be pushed to
either Dom0 or the firmware but we might in fact find ourself with exactly the
same problem on the VPCI bus.


> Roger.



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.