Xen project Mailing List

Re: RFC: PCI devices passthrough on Arm design proposal

To: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 20 Jul 2020 10:45:05 +0200

Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, nd <nd@xxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien.grall.oss@xxxxxxxxx>

Delivery-date: Mon, 20 Jul 2020 08:45:19 +0000

Ironport-sdr: mT/1Pu38k6EbUUh1OWyaUzNO0NSdqkEjarumAdty/HboMThPar3UlfVr6nGqT8BimzAU0/E80n RkJRoinWm3+toV2QAecv5r54uBfjm42/1udGIPI2zDdOqN55vBA0AGPxpQSSKYGLGRC+h3/QAE u9KNzerX1jGtWX73FiaoY6Kdeeebher7QPA45oP7SpV4lRYbZaMOz81gmphXTIuFKyd6ML7tMd jh5WR4H8QFO46Mp6OjDjEEEgEGXOudRZTAf+55qZu8AoZQUFR/5adNxAfg2X17cPaxuNEXwagM Us0=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Sat, Jul 18, 2020 at 09:49:43AM +0000, Bertrand Marquis wrote: > > > > On 17 Jul 2020, at 17:55, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: > > > > On Fri, Jul 17, 2020 at 03:21:57PM +0000, Bertrand Marquis wrote: > >>> On 17 Jul 2020, at 16:31, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: > >>> On Fri, Jul 17, 2020 at 01:22:19PM +0000, Bertrand Marquis wrote: > >>>>> On 17 Jul 2020, at 13:16, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: > >>>>>> # Emulated PCI device tree node in libxl: > >>>>>> > >>>>>> Libxl is creating a virtual PCI device tree node in the device tree > >>>>>> to enable the guest OS to discover the virtual PCI during guest > >>>>>> boot. We introduced the new config option [vpci="pci_ecam"] for > >>>>>> guests. When this config option is enabled in a guest configuration, > >>>>>> a PCI device tree node will be created in the guest device tree. > >>>>>> > >>>>>> A new area has been reserved in the arm guest physical map at which > >>>>>> the VPCI bus is declared in the device tree (reg and ranges > >>>>>> parameters of the node). A trap handler for the PCI ECAM access from > >>>>>> guest has been registered at the defined address and redirects > >>>>>> requests to the VPCI driver in Xen. > >>>>> > >>>>> Can't you deduce the requirement of such DT node based on the presence > >>>>> of a 'pci=' option in the same config file? > >>>>> > >>>>> Also I wouldn't discard that in the future you might want to use > >>>>> different emulators for different devices, so it might be helpful to > >>>>> introduce something like: > >>>>> > >>>>> pci = [ '08:00.0,backend=vpci', '09:00.0,backend=xenpt', > >>>>> '0a:00.0,backend=qemu', ... ] > >>>>> > >>>>> For the time being Arm will require backend=vpci for all the passed > >>>>> through devices, but I wouldn't rule out this changing in the future. > >>>> > >>>> We need it for the case where no device is declared in the config file > >>>> and the user > >>>> wants to add devices using xl later. In this case we must have the DT > >>>> node for it > >>>> to work. > >>> > >>> There's a passthrough xl.cfg option for that already, so that if you > >>> don't want to add any PCI passthrough devices at creation time but > >>> rather hotplug them you can set: > >>> > >>> passthrough=enabled > >>> > >>> And it should setup the domain to be prepared to support hot > >>> passthrough, including the IOMMU [0]. > >> > >> Isn’t this option covering more then PCI passthrough ? > >> > >> Lots of Arm platform do not have a PCI bus at all, so for those > >> creating a VPCI bus would be pointless. But you might need to > >> activate this to pass devices which are not on the PCI bus. > > > > Well, you can check whether the host has PCI support and decide > > whether to attach a virtual PCI bus to the guest or not? > > > > Setting passthrough=enabled should prepare the guest to handle > > passthrough, in whatever form is supported by the host IMO. > > True, we could just say that we create a PCI bus if the host has one and > passthrough is activated. > But with virtual device point, we might even need one on guest without > PCI support on the hardware :-) Sure, but at that point you might want to consider unconditionally adding an emulated PCI bus to guests anyway. You will always have time to add new options to xl, but I would start by trying to make use of the existing ones. Are you planning to add the logic in Xen to enable hot-plug of devices right away? If the implementation hasn't been considered yet I wouldn't mind leaving all this for later and just focusing on non-hotplug passthrough using pci = [ ... ] for the time being. > > > >>>>>> Limitation: > >>>>>> * Need to avoid the “iomem” and “irq” guest config > >>>>>> options and map the IOMEM region and IRQ at the same time when > >>>>>> device is assigned to the guest using the “pci” guest config options > >>>>>> when xl creates the domain. > >>>>>> * Emulated BAR values on the VPCI bus should reflect the IOMEM mapped > >>>>>> address. > >>>>> > >>>>> It was my understanding that you would identity map the BAR into the > >>>>> domU stage-2 translation, and that changes by the guest won't be > >>>>> allowed. > >>>> > >>>> In fact this is not possible to do and we have to remap at a different > >>>> address > >>>> because the guest physical mapping is fixed by Xen on Arm so we must > >>>> follow > >>>> the same design otherwise this would only work if the BARs are pointing > >>>> to an > >>>> address unused and on Juno this is for example conflicting with the guest > >>>> RAM address. > >>> > >>> This was not clear from my reading of the document, could you please > >>> clarify on the next version that the guest physical memory map is > >>> always the same, and that BARs from PCI devices cannot be identity > >>> mapped to the stage-2 translation and instead are relocated somewhere > >>> else? > >> > >> We will. > >> > >>> > >>> I'm then confused about what you do with bridge windows, do you also > >>> trap and adjust them to report a different IOMEM region? > >> > >> Yes this is what we will have to do so that the regions reflect the VPCI > >> mappings > >> and not the hardware one. > >> > >>> > >>> Above you mentioned that read-only access was given to bridge > >>> registers, but I guess some are also emulated in order to report > >>> matching IOMEM regions? > >> > >> yes that’s exact. We will clear this in the next version. > > > > If you have to go this route for domUs, it might be easier to just > > fake a PCI host bridge and place all the devices there even with > > different SBDF addresses. Having to replicate all the bridges on the > > physical PCI bus and fixing up it's MMIO windows seems much more > > complicated than just faking/emulating a single bridge? > > That’s definitely something we have to dig more on. The whole problematic > of PCI enumeration and BAR value assignation in Xen might be pushed to > either Dom0 or the firmware but we might in fact find ourself with exactly the > same problem on the VPCI bus. Not really, in order for Xen to do passthrough to a guest it must know the SBDF of a device, the resources it's using and the memory map of the guest, or else passthrough can't be done. At that point Xen has the whole picture and can decide where the resources of the device should appear on the stage-2 translation, and hence the IOMEM windows required on the bridge(s). What I'm trying to say is that I'm not convinced that exposing all the host PCI bridges with adjusted IOMEM windows is easier than just completely faking (and emulating) a PCI bridge inside of Xen. Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.