[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] ARM PCI Pass through Design Draft 5



              -----------------------------
             | PCI Pass-through in Xen ARM |
              -----------------------------
               manish.jaggi@xxxxxxxxxx
              -----------------------------

                      Draft-5


 -----------------------------------------------------------------------------
 Introduction
 -----------------------------------------------------------------------------
 This document describes the design for the PCI passthrough support in Xen
 ARM. The target system is an ARM 64bit SoC with GICv3 and SMMU and PCIe
 devices.

 It is assumed that the PVH guests will have its msi controller support and
 a Virtual ITS in Xen would redirect device interrupts to Guest.

 This document is limited to dt based pci, It will evolve to add ACPI

 -----------------------------------------------------------------------------
 Revision History
 -----------------------------------------------------------------------------
 Changes from Draft-1:
 ---------------------
 a) map_mmio hypercall removed from earlier draft
 b) device bar mapping into guest not 1:1
 c) Reserved Area in guest address space for mapping PCI-EP BARs in Stage2.
 d) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).

 Changes from Draft-2:
 ---------------------
 a) DomU boot information updated with boot-time device assignment and
 hotplug.
 b) SMMU description added
 c) Mapping between streamID - bdf - deviceID.
 d) assign_device hypercall to include virtual(guest) sbdf.
 Toolstack to generate guest sbdf rather than pciback.

 Changes from Draft-3:
 ---------------------
 a) Fixed typos and added more description
 b) NUMA and PCI passthrough description removed for now.
 c) Added example from Ian's Mail

 Changes from Draft-4:
 ------------------------
 a) Added Hypercall PHYSDEVOP_pci_dev_map_msi_specifier
 b) The design takes into account Linux PCI msi-map support
 c) Added Xen internal to get streamID from pci_dev
 d) Added few examples and dts/code snippets

 -----------------------------------------------------------------------------
 Index
 -----------------------------------------------------------------------------
   (1) Background

   (2) Basic PCI Support in Xen ARM
   (2.1) pci_hostbridge and pci_hostbridge_ops
   (2.2) PHYSDEVOP_HOSTBRIDGE_ADD hypercall
   (2.3) XEN Internal API

   (3) SMMU programming
   (3.1) Additions for PCI Passthrough
   (3.2) Mapping between streamID - deviceID - pci sbdf - requesterID

   (4) Assignment of PCI device
   (4.1) Dom0
   (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
   (4.1.1.1) For Dom0
   (4.1.1.2) For DomU
   (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space

   (4.2) DomU
   (4.2.1) Reserved Areas in guest memory space
   (4.2.2) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
   (4.2.3) Hypercall Modification for bdf mapping notification to xen

   (5) DomU FrontEnd Bus Changes
   (5.1) Change in Linux PCI frontend bus and gicv3-its node binding for domU

   (6) Glossary

   (7) References
 -----------------------------------------------------------------------------

 1.    Background
 -----------------------------------------------------------------------------
 Passthrough refers to assigning a PCI device to a guest domain (domU) such
 that the guest has full control over the device. The MMIO space / interrupts
 are managed by the guest itself, close to how a bare kernel manages a device.

 Device's access to guest address space needs to be isolated and protected.
 SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow
 device access guest memory for data transfer and sending MSI/X interrupts.
 PCI devices generated message signaled interrupt writes are within guest
 address spaces which are also translated using SMMU.


 1.1 PCI device Id in Dom0
 ------------------------------------------------------------------------------
 As per the bindings document [6], msi-specifier is generated from msi-map
 property such the msi-specifier [32:16] bits would come from the msi-map
 namespace and [15:0] would be same as RID.

 There could be multiple pci nodes in device tree with same msi-map property
        pci@84a000000000 {
                compatible = "pci-host-ecam-generic";
                device_type = "pci";
                msi-map = <0x0 0x6f 0x20000 0x10000>;
                bus-range = <0x0 0x1f>;
                reg = <0x84a0 0x0 0x0 0x2000000>;
                ...
        };

        pci@87e0c2000000 {
                compatible = "cavium,pci-host-thunder-pem";
                device_type = "pci";
                msi-map = <0x0 0x6f 0x10000 0x10000>;
                bus-range = <0x8f 0xc7>;
                reg = <0x8880 0x8f000000 0x0 0x39000000 0x87e0 0xc2000000
                 0x0 0x1000000>;
                ...
        }

        pci@849000000000 {
                compatible = "pci-host-ecam-generic";
                device_type = "pci";
                msi-map = <0x0 0x6f 0x10000 0x10000>;
                bus-range = <0x0 0x1f>;
                reg = <0x8490 0x0 0x0 0x2000000>;
                ...
        };

 1.1.1 DeviceID used in SMMU and ITS
 -------------------------------------------------------------------------------
 Each of the above PCI node has its own segment number and configuration space.
 The device_id that is programmd in ITS and SMMU is generated by RID and 
msi-map.

 Thus a mapping between sbdf and msi-specifier is requred by Xen.


 1.3 PCI device Id in Guest
 -------------------------------------------------------------------------------
 For machine:sbdf guest:sbdf needs to be generated when a device is assigned
 to a domU. Currently this is done by xen-pciback. As per discussions [4]
 on xen-devel the df generation should be done by toolstack rather than
 the xen-pciback.

 Since there is only one pci-frontend bus in domU, s:b:d.f is 0:0:d.f
 It is proposed in this design document that the df generation be done by
 toolstack and the xenstore keys be created by toolstack.

 Since the PVH guest will use a vitual ITS device for Pass through pci devices
 trapped ITS writes in Xen would need a mapping between guest sbdf to
 msi-specifier

 2. Design of PCI Support for ARM
 -----------------------------------------------------------------------------

 The APIs to read write from PCI configuration space are based on segment:bdf.
 How the sbdf is mapped to a physical address is under the realm of the PCI
 host controller.

 ARM PCI support in Xen, introduces PCI host controller similar to what
 exists in Linux. Host controller drivers registers callbacks, which are
 invoked on matching the compatible property in pci device tree node.

 Note: as pci devices are enumerated the pci node in device tree refers to
 the host controller.

 2.1   pci_hostbridge and pci_hostbridge_ops
 -----------------------------------------------------------------------------
 The init function in the PCI host driver calls to register hostbridge
 callbacks:

 int pci_hostbridge_register(pci_hostbridge_t *pcihb);

 struct pci_hostbridge_ops {
     u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
                                 u32 reg, u32 bytes);
     void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
                                 u32 reg, u32 bytes, u32 val);
 };

 struct pci_hostbridge{
     u32 segno;
     paddr_t cfg_base;
     paddr_t cfg_size;
     struct dt_device_node *dt_node;
     struct pci_hostbridge_ops ops;
     struct list_head list;
 };

 A PCI conf_read function would internally be as follows:
 u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
 {
     pci_hostbridge_t *pcihb;
     list_for_each_entry(pcihb, &pci_hostbridge_list, list)
     {
         if(pcihb-segno == seg)
             return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
     }
     return -1;
 }

*** For example in section 1.1, two hostbridges would be added to xen.
- cavium,pci-host-thunder-pem
- pci-host-ecam-generic


 2.2    PHYSDEVOP_pci_host_bridge_add hypercall
 -----------------------------------------------------------------------------
 Xen code accesses PCI configuration space based on the sbdf received from
 the guest. The order in which the pci device tree node appear may not be
 the same order of device enumeration in dom0. Thus there needs to be a
 mechanism to bind the segment number assigned by dom0 to the pci host
 controller. The hypercall is introduced:

 /* DOM0 only hypercall */
 #define PHYSDEVOP_pci_host_bridge_add    <<>>
 struct physdev_pci_host_bridge_add {
     /* IN */
     uint16_t seg;
     uint64_t cfg_base;
     uint64_t cfg_size;
 };

 This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
 hypercall.

To understand in detail about the requirement Ian's example is listed below:
-- Ref: [1]
 Imagine we have two PCI host bridges, one with CFG space at 0xA0000000 and
 a second with CFG space at 0xB0000000.

 Xen discovers these and assigns segment 0=0xA0000000 and segment
 1=0xB0000000.

 Dom0 discovers them too but assigns segment 1=0xA0000000 and segment
 0=0xB0000000 (i.e. the other way).

 Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
 the device with BDF behind the root bridge at 0xA0000000. (Perhaps this is
 the PHYSDEVOP_manage_pci_add_ext call).

 But Xen thinks it is talking about the device with BDF behind the root
 bridge at 0xB0000000 because Dom0 and Xen do not agree on what the segments
 mean. Now Xen will use the wrong device ID in the IOMMU (since that is
 associated with the host bridge), or poke the wrong configuration space, or
 whatever.

 Or maybe Xen chose 42=0xB0000000 and 43=0xA0000000 so when Dom0 starts
 talking about segment=0 and =1 it has no idea what is going on.

 PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say "Segment 0
 is the host bridge at 0xB0000000" and "Segment 1 is the host bridge at
 0xA0000000". With this there is no confusion between Xen and Dom0 because
 Xen isn't picking a segment ID, it is being told what it is by Dom0 which
 has done the picking.

 The handler code invokes to update segment number in pci_hostbridge:

 int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
 cfg_size);

 Subsequent calls to pci_conf_read/write are completed by the
 pci_hostbridge_ops of the respective pci_hostbridge.

 2.3 PHYSDEVOP_pci_dev_map_msi_specifier Hypercall
 -----------------------------------------------------------------------------
 /* Dom0 only hypercall */
 #define PHYSDEVOP_pci_dev_map_msi_specifier    33
 struct physdev_pci_dev_map_msi_specifier {
    /* IN */
    uint16_t seg;
    uint8_t bus;
    uint8_t devfn;
    uint32_t msi_specifier;
 };

 In Xen, struct arch_pci_dev, msi_speicifer is added. So from pci_dev pointer
 can get msi_specifier.

 This spcecifier would be used in ITS and SMMU.


 2.4    XEN Internal API
 -----------------------------------------------------------------------------
 a) pci_hostbridge_dt_node

 struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno);

 Returns the device tree node pointer of the pci node which is bound by the
 passed segment number. The API can be called subsequent to
 pci_hostbridge_setup

 b) pci_dev_get_msi_rid(pdev)

 uint32_t pci_dev_get_msi_rid(struct pci_dev *pdev);
 This would return msi_spceficier for pdev. IT would be called from SMMU code.


 3.    SMMU programming
 -----------------------------------------------------------------------------

 3.1.    Additions for PCI Passthrough
 -----------------------------------------------------------------------------

 3.1.1 - add_device in iommu_ops is implemented.
 -----------------------------------------------------------------------------

This is called when PHYSDEVOP_pci_add_device / PHYSDEVOP_manage_pci_add_ext
 is called from dom0.

 .add_device = arm_smmu_add_dom0_dev,
 static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
 {
         if (dev_is_pci(dev)) {
             struct pci_dev *pdev = to_pci_dev(dev);
             return arm_smmu_assign_dev(pdev-domain, devfn, dev);
         }
         return -1;
 }

 3.1.2 - remove_device in iommu_ops is implemented.
 -----------------------------------------------------------------------------
 This is called when PHYSDEVOP_pci_device_remove is called from dom0/domU.

 .remove_device = arm_smmu_remove_dev.
 TODO: add implementation details of arm_smmu_remove_dev.

 3.1.3 dev_get_dev_node is modified for pci devices.
 -----------------------------------------------------------------------------
 The function is modified to return the dt_node of the pci hostbridge from
 the device tree. This is required as non-dt devices need a way to find on
 which smmu they are attached.

 static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
 {
         struct device_node *dev_node = dev_get_dev_node(dev);
 ....

 static struct device_node *dev_get_dev_node(struct device *dev)
 {
         if (dev_is_pci(dev)) {
                 struct pci_dev *pdev = to_pci_dev(dev);
                 return pci_hostbridge_dt_node(pdev-seg);
         }
 ...

 3.1.4 __arm_smmu_get_pci_sid in smmu
------------------------------------------------------------------------------
 pci_dev_get_msi_rid(struct pci_dev *pdev) is called to return msi_specifier


 3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
 -----------------------------------------------------------------------------
 For Dom0, sbdf maps to msi_specifier which is programmed in ITS and SMMU.
 For DomU, gsbdf maps to sbdf, and msi_speficier is obtained

 requesterID is bdf. streamID is = msi_specifier.


 4.    Assignment of PCI device
 -----------------------------------------------------------------------------

 4.1    Dom0
 -----------------------------------------------------------------------------
 All PCI devices are assigned to dom0 unless hidden by pciback.hide bootargs in
 dom0. Dom0 enumerates the PCI devices. For each device the MMIO space has
 to be mapped in the Stage2 translation for dom0. For dom0 Xen maps ranges
 from device tree pci nodes in stage 2 translation during boot.

 In the flow of hypercall processing PHYSDEV_pci_add_device
 its_add_device(msi_specifier) should be called. This will allocate ITS
 specific data structures for the device. (Reference [2])

 4.1.1 Stage 2 Mapping of GITS_ITRANSLATER space (64k)
 -----------------------------------------------------------------------------

 GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so
 that SMMU can translate MSI(x) from the device using the page table of the
 domain.

 4.1.1.1 For Dom0
 -----------------------------------------------------------------------------
 GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 this
 mapping is done in the vgic driver. For domU the mapping is done by
 toolstack.


 4.1.1.2 For DomU
 -----------------------------------------------------------------------------
 For domU, while creating the domain, the toolstack reads the IPA from the
 macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
 read from a new hypercall which returns PA of GITS_ITRANSLATER_SPACE.

 Subsequently toolstack sends a hypercall to create a stage 2 mapping.

 Hypercall Details: XEN_DOMCTL_get_itranslater_space

 /* XEN_DOMCTL_get_itranslater_space */
 struct xen_domctl_get_itranslater_space {
     /* OUT variables. */
     uint64_aligned_t start_addr;
     uint64_aligned_t size;
 };

 4.2 DomU
 -----------------------------------------------------------------------------

 4.2.1 Mapping BAR regions in guest address space
 -----------------------------------------------------------------------------
 When a PCI-EP device is assigned to a domU the toolstack will read the pci
 configuration space BAR registers. Toolstack allocates a virtual BAR
 region for each BAR region, from the area reserved in guest address space for
 mapping BARs referred to as Guest BAR area. This area is defined in
 public/arch-arm.h

 /* For 32bit BARs*/
 #define GUEST_BAR_BASE_32 <<>>
 #define GUEST_BAR_SIZE_32 <<>>

 /* For 64bit BARs*/
 #define GUEST_BAR_BASE_64 <<>>
 #define GUEST_BAR_SIZE_64 <<>>

 Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2
 translation. If a BAR region address is 32b BASE_32 area would be used,
 otherwise 64b. If a combination of both is required the support is TODO.

 Toolstack manages these areas and allocate from these area. The allocation
 and deallocation is done using APIs similar to malloc and free.

 4.2.2    Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
 ----------------------------------------------------------------------------
 Toolstack also updates the xenstore information for the device
 (virtualbar:physical bar).This information is read by xen-pciback and
 returned to the domU-pcifront driver configuration space reads for BAR.

 Entries created are as follows:
 /local/domain/0/backend/pci/1/0
 vdev-N
     BDF = ""
     BAR-0-IPA = ""
     BAR-0-PA = ""
     BAR-0-SIZE = ""
     ...
     BAR-M-IPA = ""
     BAR-M-PA = ""
     BAR-M-SIZE = ""

 Note: If BAR M SIZE is 0, it is not a valid entry.

 4.2.3 Hypercall Modification (XEN_DOMCTL_assign_device)
 ----------------------------------------------------------------------------

 Folowing guest_sbdf generation the domctl to assign the device is invoked.
 This hypercall is updated to include *guest_sbdf*. Xen ITS driver can store
 this mapping domID: guest_sbdf: machine_sbdf and can be used later.

 struct xen_domctl_assign_device {
    uint32_t dev;   /* XEN_DOMCTL_DEV_* */
    union {
        struct {
            uint32_t machine_sbdf; /* machine PCI ID of assigned device */
            uint32_t guest_sbdf;   /* guest PCI ID of assigned device */
        } pci;
        struct {
            uint32_t size; /* Length of the path */
            XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
        } dt;
    } u;
 };

 In the handler of this hypercall an internal API function
 its_assign_device(domid, machine_sbdf, guest_sbdf)
 (Reference [2])

 is called and will store the mapping between machine_sbdf:guest_sbdf.

 5. Change in Linux PCI frontEnd - backend driver for MSI/X programming
 -----------------------------------------------------------------------------

 5.1 pci-frontend bus and gicv3-its node binding for domU
 -----------------------------------------------------------------------------
 It is assumed that toolstack would generate a gicv3-its node in domU device 
tree.

 All the devices assigned to domU are enumerated on a PCI frontend bus.
 On this bus interrupt parent is set as gicv3-its for ARM systems. As the
 gicv3-its is emulated in xen, all the access by domU driver is trapped.
 This helps configuration & direct injection of MSI(LPI) into the guest. Thus
 the frontend-backend communication for MSI is no longer required.

 Frontend-backend communication is required only for reading PCI configuration
 space by dom0 on behalf of domU.

 Code snippet:
    pcifront_scan_root ()
    {
        ....
        bus_entry->bus = b;
    #ifdef CONFIG_ARM64
        msi_node = of_find_compatible_node(NULL,NULL, "arm,gic-v3-its");
        if(msi_node) {
            b->msi = of_pci_find_msi_chip_by_node(msi_node);
            if(!b->msi) {
               printk(KERN_ERR"Unable to find bus->msi node \r\n");
               goto err_out;
            }
        }else {
               printk(KERN_ERR"Unable to find arm,gic-v3-its compatible node 
\r\n");
               goto err_out;
        }
    #endif
    }

 - register_pci_notifier is called for domU as well. The restriction for only 
dom0
 can be removed for ARM64.

 As device is added on the pci-frontend bus, the notification is sent to xen pci
 They hypercall pci_device_add is invoked.

 There would be no msi-map property, so gsbdf is the deviceID


 6.    Glossary
 -----------------------------------------------------------------------------
 MSI            : Message Signalled Interrupt
 ITS            : Interrupt Translation Service
 GIC            : Generic Interrupt Controller
 LPI            : Locality-specific Peripheral Interrupt
 RID            : Requester ID (bdf)

 7.    References
 -----------------------------------------------------------------------------
 [1]. http://osdir.com/ml/general/2015-08/msg15346.html
 [2]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg01984.html
 [3]. http://xenbits.xen.org/people/ianc/vits/draftG.html
 [4]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg05513.html
 [5]. 
http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf
 [6]. 
http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/pci/pci-msi.txt







_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.