Xen project Mailing List

Re: PCI devices passthrough on Arm design proposal

To: Rahul Singh <Rahul.Singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Julien Grall <julien@xxxxxxx>

Date: Fri, 17 Jul 2020 14:50:56 +0100

Cc: nd <nd@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Julien Grall <julien.grall.oss@xxxxxxxxx>

Delivery-date: Fri, 17 Jul 2020 13:51:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

(Resending to the correct ML) On 17/07/2020 14:23, Julien Grall wrote:



On 16/07/2020 18:02, Rahul Singh wrote:

Hello All,

Hi,

Following up on discussion on PCI Passthrough support on ARM that wehad at the XEN summit, we are submitting a Review For Comment and adesign proposal for PCI passthrough support on ARM. Feel free to giveyour feedback.
The followings describe the high-level design proposal of the PCIpassthrough support and how the different modules within the systeminteracts with each other to assign a particular PCI device to the guest.

There was an attempt a few years ago to get a design document for PCIpassthrough (see [1]). I would suggest to have a look at the thread as Ithink it would help to have an overview of all the components (e.g MSIcontrollers...) even if they will not be implemented at the beginning.

# Title:

PCI devices passthrough on Arm design proposal

# Problem statement:
On ARM there in no support to assign a PCI device to a guest. PCIdevice passthrough capability allows guests to have full access tosome PCI devices. PCI device passthrough allows PCI devices to appearand behave as if they were physically attached to the guest operatingsystem and provide full isolation of the PCI devices.
Goal of this work is to also support Dom0Less configuration so the PCIbackend/frontend drivers used on x86 shall not be used on Arm. It willuse the existing VPCI concept from X86 and implement the virtual PCIbus through IO emulation such that only assigned devices are visibleto the guest and guest can use the standard PCI driver.
Only Dom0 and Xen will have access to the real PCI bus, guest willhave a direct access to the assigned device itself. IOMEM memory willbe mapped to the guest and interrupt will be redirected to the guest.SMMU has to be configured correctly to have DMA transaction.
## Current state: Draft version

# Proposer(s): Rahul Singh, Bertrand Marquis

# Proposal:
This section will describe the different subsystem to support the PCIdevice passthrough and how these subsystems interact with each otherto assign a device to the guest.
# PCI Terminology:
Host Bridge: Host bridge allows the PCI devices to talk to the rest ofthe computer.ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanismdeveloped to allow PCIe to access configuration space. The spaceavailable per function is 4KB.
# Discovering PCI Host Bridge in XEN:
In order to support the PCI passthrough XEN should be aware of all thePCI host bridges available on the system and should be able to accessthe PCI configuration space. ECAM configuration access is supported asof now. XEN during boot will read the PCI device tree node “reg”property and will map the ECAM space to the XEN memory using the“ioremap_nocache ()” function.
If there are more than one segment on the system, XEN will read the“linux, pci-domain” property from the device tree node and configurethe host bridge segment number accordingly. All the PCI device treenodes should have the “linux,pci-domain” property so that there willbe no conflicts. During hardware domain boot Linux will also use thesame “linux,pci-domain” property and assign the domain number to thehost bridge.

AFAICT, "linux,pci-domain" is not a mandatory option and mostly tie toLinux. What would happen with other OS?

But I would rather avoid trying to mandate a user to modifying his/herdevice-tree in order to support PCI passthrough. It would be better toconsider Xen to assign the number if it is not present.

When Dom0 tries to access the PCI config space of the device, XEN willfind the corresponding host bridge based on segment number and accessthe corresponding config space assigned to that bridge.
Limitation:
* Only PCI ECAM configuration space access is supported.
* Device tree binding is supported as of now, ACPI is not supported.

We want to differentiate the high-level design from the actualimplementation. While you may not yet implement ACPI, we still need tokeep it in mind to avoid incompatibilities in long term.

* Need to port the PCI host bridge access code to XEN to access theconfiguration space (generic one works but lots of platforms willrequired some specific code or quirks).
# Discovering PCI devices:
PCI-PCIe enumeration is a process of detecting devices connected toits host. It is the responsibility of the hardware domain or bootfirmware to do the PCI enumeration and configure the BAR, PCIcapabilities, and MSI/MSI-X configuration.
PCI-PCIe enumeration in XEN is not feasible for the configuration partas it would require a lot of code inside Xen which would require a lotof maintenance. Added to this many platforms require some quirks inthat part of the PCI code which would greatly improve Xen complexity.Once hardware domain enumerates the device then it will communicate toXEN via the below hypercall.
#define PHYSDEVOP_pci_device_add        25
struct physdev_pci_device_add {
     uint16_t seg;
     uint8_t bus;
     uint8_t devfn;
     uint32_t flags;
     struct {
         uint8_t bus;
         uint8_t devfn;
     } physfn;
     /*
     * Optional parameters array.
* First element ([0]) is PXM domain associated with the device(if * XEN_PCI_DEV_PXM is set)
     */
     uint32_t optarr[XEN_FLEX_ARRAY_DIM];
     };
As the hypercall argument has the PCI segment number, XEN will accessthe PCI config space based on this segment number and find thehost-bridge corresponding to this segment number. At this stage hostbridge is fully initialized so there will be no issue to access theconfig space.
XEN will add the PCI devices in the linked list maintain in XEN usingthe function pci_add_device(). XEN will be aware of all the PCIdevices on the system and all the device will be added to the hardwaredomain.

I understand this what x86 does. However, may I ask why we would want itfor Arm?

Limitations:
* When PCI devices are added to XEN, MSI capability is not initializedinside XEN and not supported as of now.* ACS capability is disable for ARM as of now as after enabling itdevices are not accessible.


I am not sure to understand this. Can you expand?

* Dom0Less implementation will require to have the capacity inside Xento discover the PCI devices (without depending on Dom0 to declare themto Xen).
# Enable the existing x86 virtual PCI support for ARM:
The existing VPCI support available for X86 is adapted for Arm. Whenthe device is added to XEN via the hyper call“PHYSDEVOP_pci_device_add”, VPCI handler for the config space accessis added to the PCI device to emulate the PCI devices.
A MMIO trap handler for the PCI ECAM space is registered in XEN sothat when guest is trying to access the PCI config space, XEN willtrap the access and emulate read/write using the VPCI and not the realPCI hardware.
Limitation:
* No handler is register for the MSI configuration.
* Only legacy interrupt is supported and tested as of now, MSI is notimplemented and tested.

IIRC, legacy interrupt may be shared between two PCI devices. How do youplan to handle this on Arm?

# Assign the device to the guest:
Assign the PCI device from the hardware domain to the guest is doneusing the below guest config option. When xl tool create the domain,PCI devices will be assigned to the guest VPCI bus.

Above, you suggest that device will be assigned to the hardware domainat boot. I am assuming this also means that all the interrupts/MMIOswill be routed/mapped, is that correct?


If so, can you provide a rough sketch how assign/deassign will work?

    pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]
Guest will be only able to access the assigned devices and see thebridges. Guest will not be able to access or see the devices that areno assigned to him.
Limitation:
* As of now all the bridges in the PCI bus are seen by the guest onthe VPCI bus.

Why do you want to expose all the bridges to a guest? Does this meanthat the BDF should always match between the host and the guest?

# Emulated PCI device tree node in libxl:
Libxl is creating a virtual PCI device tree node in the device tree toenable the guest OS to discover the virtual PCI during guest boot. Weintroduced the new config option [vpci="pci_ecam"] for guests. Whenthis config option is enabled in a guest configuration, a PCI devicetree node will be created in the guest device tree.
A new area has been reserved in the arm guest physical map at whichthe VPCI bus is declared in the device tree (reg and ranges parametersof the node). A trap handler for the PCI ECAM access from guest hasbeen registered at the defined address and redirects requests to theVPCI driver in Xen.
Limitation:
* Only one PCI device tree node is supported as of now.

BAR value and IOMEM mapping:
Linux guest will do the PCI enumeration based on the area reserved forECAM and IOMEM ranges in the VPCI device tree node. Once PCI deviceis assigned to the guest, XEN will map the guest PCI IOMEM region tothe real physical IOMEM region only for the assigned devices.
As of now we have not modified the existing VPCI code to map the guestPCI IOMEM region to the real physical IOMEM region. We used theexisting guest “iomem” config option to map the region.
For example:
    Guest reserved IOMEM region:  0x04020000
         Real physical IOMEM region:0x50000000
         IOMEM size:128MB
         iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]
There is no need to map the ECAM space as XEN already have access tothe ECAM space and XEN will trap ECAM accesses from the guest and willperform read/write on the VPCI bus.
IOMEM access will not be trapped and the guest will directly accessthe IOMEM region of the assigned device via stage-2 translation.
In the same, we mapped the assigned devices IRQ to the guest usingbelow config options.
    irqs= [ NUMBER, NUMBER, ...]

Limitation:
* Need to avoid the “iomem” and “irq” guest config options and map theIOMEM region and IRQ at the same time when device is assigned to theguest using the “pci” guest config options when xl creates the domain.* Emulated BAR values on the VPCI bus should reflect the IOMEM mappedaddress.* X86 mapping code should be ported on Arm so that the stage-2translation is adapted when the guest is doing a modification of theBAR registers values (to map the address requested by the guest for aspecific IOMEM to the address actually contained in the real BARregister of the corresponding device).
# SMMU configuration for guest:
When assigning PCI devices to a guest, the SMMU configuration shouldbe updated to remove access to the hardware domain memory and addconfiguration to have access to the guest memory with the properaddress translation so that the device can do DMA operations from andto the guest memory only.


There are a few more questions to answer here:

- When a guest is destroyed, who will be the owner of the PCIdevices? Depending on the answer, how do you make sure the device isquiescent? - Is there any memory access that can bypassed the IOMMU (e.gdoorbell)?


Cheers,

[1]https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg02520.html

-- Julien Grall

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.