[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [early RFC] ARM PCI Passthrough design document



Hi Julien,

On 12/29/2016 07:34 PM, Julien Grall wrote:
> Hi all,
> 
> The document below is an early version of a design
> proposal for PCI Passthrough in Xen. It aims to
> describe from an high level perspective the interaction
> with the different subsystems and how guest will be able
> to discover and access PCI.
> 
> I am aware that a similar design has been posted recently
> by Cavium (see [1]), however the approach to expose PCI
> to guest is different. We have request to run unmodified
> baremetal OS on Xen, a such guest would directly
> access the devices and no PV drivers will be used.
> 
> That's why this design is based on emulating a root controller.
> This also has the advantage to have the VM interface as close
> as baremetal allowing the guest to use firmware tables to discover
> the devices.
> 
> Currently on ARM, Xen does not have any knowledge about PCI devices.
> This means that IOMMU and interrupt controller (such as ITS)
> requiring specific configuration will not work with PCI even with
> DOM0.
> 
> The PCI Passthrough work could be divided in 2 phases:
>       * Phase 1: Register all PCI devices in Xen => will allow
>                  to use ITS and SMMU with PCI in Xen
>         * Phase 2: Assign devices to guests
> 
> This document aims to describe the 2 phases, but for now only phase
> 1 is fully described.
> 
> I have sent the design document to start to gather feedback on
> phase 1.
> 
> Cheers,
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2016-12/msg00224.html 
> 
> ========================
> % PCI pass-through support on ARM
> % Julien Grall <julien.grall@xxxxxxxxxx>
> % Draft A
> 
> # Preface
> 
> This document aims to describe the components required to enable PCI
> passthrough on ARM.
> 
> This is an early draft and some questions are still unanswered, when this is
> the case the text will contain XXX.
> 
> # Introduction
> 
> PCI passthrough allows to give control of physical PCI devices to guest. This
> means that the guest will have full and direct access to the PCI device.
> 
> ARM is supporting one kind of guest that is exploiting as much as possible
> virtualization support in hardware. The guest will rely on PV driver only
> for IO (e.g block, network), interrupts will come through the virtualized
> interrupt controller. This means that there are no big changes required
> within the kernel.
> 
> By consequence, it would be possible to replace PV drivers by assigning real
> devices to the guest for I/O access. Xen on ARM would therefore be able to
> run unmodified operating system.
> 
> To achieve this goal, it looks more sensible to go towards emulating the
> host bridge (we will go into more details later). A guest would be able
> to take advantage of the firmware tables and obviating the need for a specific
> driver for Xen.
> 
> Thus in this document we follow the emulated host bridge approach.
> 
> # PCI terminologies
> 
> Each PCI device under a host bridge is uniquely identified by its Requester ID
> (AKA RID). A Requester ID is a triplet of Bus number, Device number, and
> Function.
> 
> When the platform has multiple host bridges, the software can add fourth
> number called Segment to differentiate host bridges. A PCI device will
> then uniquely by segment:bus:device:function (AKA SBDF).
> 
> So given a specific SBDF, it would be possible to find the host bridge and the
> RID associated to a PCI device.
> 
> # Interaction of the PCI subsystem with other subsystems
> 
> In order to have a PCI device fully working, Xen will need to configure
> other subsystems subsytems such as the SMMU and the Interrupt Controller.
> 
> The interaction expected between the PCI subsystem and the other is:
>     * Add a device
>     * Remove a device
>     * Assign a device to a guest
>     * Deassign a device from a guest
> 
> XXX: Detail the interaction when assigning/deassigning device
> 
> The following subsections will briefly describe the interaction from an
> higher level perspective. Implementation details (callback, structure...)
> is out of scope.
> 
> ## SMMU
> 
> The SMMU will be used to isolate the PCI device when accessing the memory
> (for instance DMA and MSI Doorbells). Often the SMMU will be configured using
> a StreamID (SID) that can be deduced from the RID with the help of the 
> firmware
> tables (see below).
> 
> Whilst in theory all the memory transaction issued by a PCI device should
> go through the SMMU, on certain platforms some of the memory transaction may
> not reach the SMMU because they are interpreted by the host bridge. For
> instance this could happen if the MSI doorbell is built into the PCI host
> bridge. See [6] for more details.
> 
> XXX: I think this could be solved by using the host memory layout when
> creating a guest with PCI devices => Detail it.
> 
> ## Interrupt controller
> 
> PCI supports three kind of interrupts: legacy interrupt, MSI and MSI-X. On ARM
> legacy interrupts will be mapped to SPIs. MSI and MSI-x will be
> either mapped to SPIs or LPIs.
> 
> Whilst SPIs can be programmed using an interrupt number, LPIs can be
> identified via a pair (DeviceID, EventID) when configure through the ITS.
> 
> The DeviceID is a unique identifier for each MSI-capable device that can
> be deduced from the RID with the help of the firmware tables (see below).
> 
> XXX: Figure out if something is necessary for GICv2m
> 
> # Information available in the firmware tables
> 
> ## ACPI
> 
> ### Host bridges
> 
> The static table MCFG (see 4.2 in [1]) will describe the host bridges 
> available
> at boot and supporting ECAM. Unfortunately there are platforms out there
> (see [2]) that re-use MCFG to describe host bridge that are not fully ECAM
> compatible.
> 
> This means that Xen needs to account for possible quirks in the host bridge.
> The Linux community are working on a patch series for see (see [2] and [3])
> where quirks will be detected with:
>     * OEM ID
>     * OEM Table ID
>     * OEM Revision
>     * PCI Segment (from _SEG)
>     * PCI bus number range (from _CRS, wildcard allowed)
> 
> Based on what Linux is currently doing, there are two kind of quirks:
>     * Accesses to the configuration space of certain sizes are not allowed
>     * A specific driver is necessary for driving the host bridge
> 
> The former is straight forward to solve, the latter will require more thought.
> Instantiation of a specific driver for the host controller can be easily done
> if Xen has the information to detect it. However, those drivers may require
> resources described in ASL (see [4] for instance).
> 
> XXX: Need more investigation to know whether the missing information should
> be passed by DOM0 or hardcoded in the driver.
> 
> ### Finding the StreamID and DeviceID
> 
> The static table IORT (see [5]) will provide information that will help to
> deduce the StreamID and DeviceID from a given RID.
> 
> ## Device Tree
> 
> ### Host bridges
> 
> Each Device Tree node associated to a host bridge will have at least the
> following properties (see bindings in [8]):
>     - device_type: will always be "pci".
>     - compatible: a string indicating which driver to instantiate
> 
> The node may also contain optional properties such as:
>     - linux,pci-domain: assign a fix segment number
>     - bus-range: indicate the range of bus numbers supported
> 
> When the property linux,pci-domain is not present, the operating system would
> have to allocate the segment number for each host bridges. Because the
> algorithm to allocate the segment is not specified, it is necessary for
> DOM0 and Xen to agree on the number before any PCI is been added.
> 
> ### Finding the StreamID and DeviceID
> 
> ### StreamID
> 
> The first binding existing (see [9]) for SMMU didn't have a way to describe 
> the
> relationship between RID and StreamID, it was assumed that StreamID == 
> RequesterID.
> This bindins has now been deprecated in favor of a generic binding (see [10])
> which will use the property "iommu-map" to describe the relationship between
> an RID, the associated IOMMU and the StreamID.
> 
> ### DeviceID
> 
> The relationship between the RID and the DeviceID can be found using the
> property "msi-map" (see [11]).
> 
> # Discovering PCI devices
> 
> Whilst PCI devices are currently available in DOM0, the hypervisor does not
> have any knowledge of them. The first step of supporting PCI passthrough is
> to make Xen aware of the PCI devices.
> 
> Xen will require access to the PCI configuration space to retrieve information
> for the PCI devices or access it on behalf of the guest via the emulated
> host bridge.
> 
> ## Discovering and register hostbridge
> 
> Both ACPI and Device Tree do not provide enough information to fully
> instantiate an host bridge driver. In the case of ACPI, some data may come
> from ASL, whilst for Device Tree the segment number is not available.
> 
> So Xen needs to rely on DOM0 to discover the host bridges and notify Xen
> with all the relevant informations. This will be done via a new hypercall
> PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be:
> 
> struct physdev_pci_host_bridge_add
> {
>     /* IN */
>     uint16_t seg;
>     /* Range of bus supported by the host bridge */
>     uint8_t  bus_start;
>     uint8_t  bus_nr;
>     uint32_t res0;  /* Padding */
>     /* Information about the configuration space region */
>     uint64_t cfg_base;
>     uint64_t cfg_size;
> }
> 
> DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host
> bridge available on the platform. When Xen is receiving the hypercall, the
> the driver associated to the host bridge will be instantiated.
> 

I think, PCI passthrough and DOM0 w/ACPI enumerating devices on PCI are 
separate features.
Without Xen mapping PCI config space region in stage2 of dom0, ACPI dom0 wont 
boot.
Currently for dt xen does that.

So can we have 2 design documents
a) PCI passthrough
b) ACPI dom0/domU support in Xen and Linux
- this may include:
b.1 Passing IORT to Dom0 without smmu
b.2 Hypercall to map PCI config space in dom0
b.3 <more>

What do you think?


> XXX: Shall we limit DOM0 the access to the configuration space from that
> moment?
> 
> ## Discovering and register PCI
> 
> Similarly to x86, PCI devices will be discovered by DOM0 and register
> using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.
> 
> By default all the PCI devices will be assigned to DOM0. So Xen would have
> to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI
> devices. As mentioned earlier, those subsystems will require the StreamID
> and DeviceID. Both can be deduced from the RID.
> 
> XXX: How to hide PCI devices from DOM0?
> 
> # Glossary
> 
> ECAM: Enhanced Configuration Mechanism
> SBDF: Segment Bus Device Function. The segment is a software concept.
> MSI: Message Signaled Interrupt
> SPI: Shared Peripheral Interrupt
> LPI: Locality-specific Peripheral Interrupt
> ITS: Interrupt Translation Service
> 
> # Bibliography
> 
> [1] PCI firmware specification, rev 3.2
> [2] https://www.spinics.net/lists/linux-pci/msg56715.html
> [3] https://www.spinics.net/lists/linux-pci/msg56723.html
> [4] https://www.spinics.net/lists/linux-pci/msg56728.html
> [5] 
> http://infocenter.arm.com/help/topic/com.arm.doc.den0049b/DEN0049B_IO_Remapping_Table.pdf
> [6] https://www.spinics.net/lists/kvm/msg140116.html
> [7] http://www.firmware.org/1275/bindings/pci/pci2_1.pdf
> [8] Documents/devicetree/bindings/pci
> [9] Documents/devicetree/bindings/iommu/arm,smmu.txt
> [10] Document/devicetree/bindings/pci/pci-iommu.txt
> [11] Documents/devicetree/bindings/pci/pci-msi.txt
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.