[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RFC: PCI devices passthrough on Arm design proposal


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Rahul Singh <Rahul.Singh@xxxxxxx>
  • Date: Thu, 16 Jul 2020 17:10:05 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=k9327hoVOp+SlRJK5dGd3XiqB91osgeONFftx4DWY1I=; b=hA71L26ePdn4cbZ6gC/6ktDfB8XegEj31o7Oav8XU8400avMCOeMCZ7qB4TtV3Kesene6kuXKRbuCg/9moChL6m2VNfebdMCo0dWu62iQbdsDz7jI/GBtszSOwgggyF0FEuLyX3y2Ip/4P/kPaRlqzgdH0UzaYNrE02EXzlGtB5HbXDqc4wqDgUNk79V7SULmnyoe0Z1ct9O7K9m+1aWu37JU8VNADYSdbHaw+LjllYYWj5BYzvKnu78MoekqWj7p0OafmRAK56IK2/pmDqLtNi2IFD3wmEaupNfBW6fWd/OyOd5ET/TvExz3kQE0BrKB8G6y49UcuOpxlrWUJpPcg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cfXzXs44t/oqlYHQSeVI0aKrwGIBdaEaGUaYmTLTwh4Vw+HTluVXP/P6tLox5sPO+TDtfbf7pcHgnZBf2r++TBzcd59e2v/X5MxngKzP6qCgck5oJh/E4pUvcCpmEh1ZOVXSfJ9KVQbnAb31EfqPJkryJUCe7bjOWZaISOnANfgO1pET2i6a4vL3pZaDdcV9jT7E1cTBzttEI6nUxRl3KZL7zHU8h/hx0cZaAya3Hf9scBYTS6c8NwL1WUQ1uxgjXUTZny147AoyzGeU5czbNRDMmuM0VOgA5UgPt8LLkdOZOHIMWTEDzCFVYDQC7V23ObkH11DjP5AMRZeAtGbzEg==
  • Authentication-results-original: lists.xenproject.org; dkim=none (message not signed) header.d=none; lists.xenproject.org; dmarc=none action=none header.from=arm.com;
  • Cc: nd <nd@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Julien Grall <julien.grall.oss@xxxxxxxxx>
  • Delivery-date: Thu, 16 Jul 2020 17:10:40 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHWW4kYTVU0hTDyYEitKlUuU5vZlKkKf2uAgAACLIA=
  • Thread-topic: PCI devices passthrough on Arm design proposal

Hello All,

Following up on discussion on PCI Passthrough support on ARM that we had at the 
XEN summit, we are submitting a Review For Comment and a design proposal for 
PCI passthrough support on ARM. Feel free to give your feedback.

The followings describe the high-level design proposal of the PCI passthrough 
support and how the different modules within the system interacts with each 
other to assign a particular PCI device to the guest.

# Title:

PCI devices passthrough on Arm design proposal

# Problem statement:

On ARM there in no support to assign a PCI device to a guest. PCI device 
passthrough capability allows guests to have full access to some PCI devices. 
PCI device passthrough allows PCI devices to appear and behave as if they were 
physically attached to the guest operating system and provide full isolation of 
the PCI devices.

Goal of this work is to also support Dom0Less configuration so the PCI 
backend/frontend drivers used on x86 shall not be used on Arm. It will use the 
existing VPCI concept from X86 and implement the virtual PCI bus through IO 
emulation​ such that only assigned devices are visible​ to the guest and guest 
can use the standard PCI driver.

Only Dom0 and Xen will have access to the real PCI bus,​ guest will have a 
direct access to the assigned device itself​. IOMEM memory will be mapped to 
the guest ​and interrupt will be redirected to the guest. SMMU has to be 
configured correctly to have DMA transaction.

## Current state: Draft version

# Proposer(s): Rahul Singh, Bertrand Marquis

# Proposal:

This section will describe the different subsystem to support the PCI device 
passthrough and how these subsystems interact with each other to assign a 
device to the guest.

# PCI Terminology:

Host Bridge: Host bridge allows the PCI devices to talk to the rest of the 
computer.  
ECAM: ECAM (Enhanced Configuration Access Mechanism) is a mechanism developed 
to allow PCIe to access configuration space. The space available per function 
is 4KB.

# Discovering PCI Host Bridge in XEN:

In order to support the PCI passthrough XEN should be aware of all the PCI host 
bridges available on the system and should be able to access the PCI 
configuration space. ECAM configuration access is supported as of now. XEN 
during boot will read the PCI device tree node “reg” property and will map the 
ECAM space to the XEN memory using the “ioremap_nocache ()” function.

If there are more than one segment on the system, XEN will read the “linux, 
pci-domain” property from the device tree node and configure  the host bridge 
segment number accordingly. All the PCI device tree nodes should have the 
“linux,pci-domain” property so that there will be no conflicts. During hardware 
domain boot Linux will also use the same “linux,pci-domain” property and assign 
the domain number to the host bridge.

When Dom0 tries to access the PCI config space of the device, XEN will find the 
corresponding host bridge based on segment number and access the corresponding 
config space assigned to that bridge.

Limitation:
* Only PCI ECAM configuration space access is supported.
* Device tree binding is supported as of now, ACPI is not supported.
* Need to port the PCI host bridge access code to XEN to access the 
configuration space (generic one works but lots of platforms will required  
some specific code or quirks).

# Discovering PCI devices:

PCI-PCIe enumeration is a process of detecting devices connected to its host. 
It is the responsibility of the hardware domain or boot firmware to do the PCI 
enumeration and configure the BAR, PCI capabilities, and MSI/MSI-X 
configuration.

PCI-PCIe enumeration in XEN is not feasible for the configuration part as it 
would require a lot of code inside Xen which would require a lot of 
maintenance. Added to this many platforms require some quirks in that part of 
the PCI code which would greatly improve Xen complexity. Once hardware domain 
enumerates the device then it will communicate to XEN via the below hypercall.

#define PHYSDEVOP_pci_device_add        25
struct physdev_pci_device_add {
    uint16_t seg;
    uint8_t bus;
    uint8_t devfn;
    uint32_t flags;
    struct {
        uint8_t bus;
        uint8_t devfn;
    } physfn;
    /*
    * Optional parameters array.
    * First element ([0]) is PXM domain associated with the device (if * 
XEN_PCI_DEV_PXM is set)
    */
    uint32_t optarr[XEN_FLEX_ARRAY_DIM];
    };

As the hypercall argument has the PCI segment number, XEN will access the PCI 
config space based on this segment number and find the host-bridge 
corresponding to this segment number. At this stage host bridge is fully 
initialized so there will be no issue to access the config space.

XEN will add the PCI devices in the linked list maintain in XEN using the 
function pci_add_device(). XEN will be aware of all the PCI devices on the 
system and all the device will be added to the hardware domain.

Limitations:
* When PCI devices are added to XEN, MSI capability is not initialized inside 
XEN and not supported as of now.
* ACS capability is disable for ARM as of now as after enabling it devices are 
not accessible.
* Dom0Less implementation will require to have the capacity inside Xen to 
discover the PCI devices (without depending on Dom0 to declare them to Xen).

# Enable the existing x86 virtual PCI support for ARM:

The existing VPCI support available for X86 is adapted for Arm. When the device 
is added to XEN via the hyper call “PHYSDEVOP_pci_device_add”, VPCI handler for 
the config space access is added to the PCI device to emulate the PCI devices.

A MMIO trap handler for the PCI ECAM space is registered in XEN so that when 
guest is trying to access the PCI config space, XEN will trap the access and 
emulate read/write using the VPCI and not the real PCI hardware.

Limitation:
* No handler is register for the MSI configuration.
* Only legacy interrupt is supported and tested as of now, MSI is not 
implemented and tested.  

# Assign the device to the guest:

Assign the PCI device from the hardware domain to the guest is done using the 
below guest config option. When xl tool create the domain, PCI devices will be 
assigned to the guest VPCI bus.
        pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...]

Guest will be only able to access the assigned devices and see the bridges. 
Guest will not be able to access or see the devices that are no assigned to him.

Limitation:
* As of now all the bridges in the PCI bus are seen by the guest on the VPCI 
bus.

# Emulated PCI device tree node in libxl:

Libxl is creating a virtual PCI device tree node in the device tree to enable 
the guest OS to discover the virtual PCI during guest boot. We introduced the 
new config option [vpci="pci_ecam"] for guests. When this config option is 
enabled in a guest configuration, a PCI device tree node will be created in the 
guest device tree.

A new area has been reserved in the arm guest physical map at which the VPCI 
bus is declared in the device tree (reg and ranges parameters of the node). A 
trap handler for the PCI ECAM access from guest has been registered at the 
defined address and redirects requests to the VPCI driver in Xen.

Limitation:
* Only one PCI device tree node is supported as of now.

BAR value and IOMEM mapping:

Linux guest will do the PCI enumeration based on the area reserved for ECAM and 
IOMEM ranges in the VPCI device tree node. Once PCI     device is assigned to 
the guest, XEN will map the guest PCI IOMEM region to the real physical IOMEM 
region only for the assigned devices.

As of now we have not modified the existing VPCI code to map the guest PCI 
IOMEM region to the real physical IOMEM region. We used the existing guest 
“iomem” config option to map the region.
For example:
        Guest reserved IOMEM region:  0x04020000
        Real physical IOMEM region:0x50000000
        IOMEM size:128MB
        iomem config will be:   iomem = ["0x50000,0x8000@0x4020"]

There is no need to map the ECAM space as XEN already have access to the ECAM 
space and XEN will trap ECAM accesses from the guest and will perform 
read/write on the VPCI bus.

IOMEM access will not be trapped and the guest will directly access the IOMEM 
region of the assigned device via stage-2 translation.

In the same, we mapped the assigned devices IRQ to the guest using below config 
options.
        irqs= [ NUMBER, NUMBER, ...]

Limitation:
* Need to avoid the “iomem” and “irq” guest config options and map the IOMEM 
region and IRQ at the same time when device is assigned to the guest using the 
“pci” guest config options when xl creates the domain.
* Emulated BAR values on the VPCI bus should reflect the IOMEM mapped address.
* X86 mapping code should be ported on Arm so that the stage-2 translation is 
adapted when the guest is doing a modification of the BAR registers values (to 
map the address requested by the guest for a specific IOMEM to the address 
actually contained in the real BAR register of the corresponding device).

# SMMU configuration for guest:

When assigning PCI devices to a guest, the SMMU configuration should be updated 
to remove access to the hardware domain memory and add
configuration to have access to the guest memory with the proper address 
translation so that the device can do DMA operations from and to the guest 
memory only.

# MSI/MSI-X support:
Not implement and tested as of now.

# ITS support:
Not implement and tested as of now.

Regards,
Rahul



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.