[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PCI Pass-through in Xen ARM: Draft 4



CC'ing a few other x86 people as this is likely to be the same approach
that will be taken by PVH.
        
    
On Thu, 13 Aug 2015, Manish Jaggi wrote:
>               -----------------------------
>              | PCI Pass-through in Xen ARM |
>               -----------------------------
>              manish.jaggi@xxxxxxxxxxxxxxxxxx
>              -------------------------------
> 
>                       Draft-4
> 
> 
>  -----------------------------------------------------------------------------
>  Introduction
>  -----------------------------------------------------------------------------
>  This document describes the design for the PCI passthrough support in Xen
>  ARM. The target system is an ARM 64bit SoC with GICv3 and SMMU v2 and PCIe
>  devices.
> 
>  -----------------------------------------------------------------------------
>  Revision History
>  -----------------------------------------------------------------------------
>  Changes from Draft-1:
>  ---------------------
>  a) map_mmio hypercall removed from earlier draft
>  b) device bar mapping into guest not 1:1
>  c) Reserved Area in guest address space for mapping PCI-EP BARs in Stage2.
>  d) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
> 
>  Changes from Draft-2:
>  ---------------------
>  a) DomU boot information updated with boot-time device assignment and
>  hotplug.
>  b) SMMU description added
>  c) Mapping between streamID - bdf - deviceID.
>  d) assign_device hypercall to include virtual(guest) sbdf.
>  Toolstack to generate guest sbdf rather than pciback.
> 
>  Changes from Draft-3:
>  ---------------------
>  a) Fixed typos and added more description
>  b) NUMA and PCI passthrough description removed for now.
>  c) Added example from Ian's Mail
> 
>  -----------------------------------------------------------------------------
>  Index
>  -----------------------------------------------------------------------------
>    (1) Background
> 
>    (2) Basic PCI Support in Xen ARM
>    (2.1) pci_hostbridge and pci_hostbridge_ops
>    (2.2) PHYSDEVOP_HOSTBRIDGE_ADD hypercall
>    (2.3) XEN Internal API
> 
>    (3) SMMU programming
>    (3.1) Additions for PCI Passthrough
>    (3.2) Mapping between streamID - deviceID - pci sbdf - requesterID
> 
>    (4) Assignment of PCI device
>    (4.1) Dom0
>    (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
>    (4.1.1.1) For Dom0
>    (4.1.1.2) For DomU
>    (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space
> 
>    (4.2) DomU
>    (4.2.1) Reserved Areas in guest memory space
>    (4.2.2) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>    (4.2.3) Hypercall Modification for bdf mapping notification to xen
> 
>    (5) DomU FrontEnd Bus Changes
>    (5.1) Change in Linux PCI frontend bus and gicv3-its node binding for domU
> 
>    (6) Glossary
> 
>    (7) References
>  -----------------------------------------------------------------------------
> 
>  1.    Background of PCI passthrough
>  -----------------------------------------------------------------------------
>  Passthrough refers to assigning a PCI device to a guest domain (domU) such
>  that the guest has full control over the device. The MMIO space / interrupts
>  are managed by the guest itself, close to how a bare kernel manages a device.
> 
>  Device's access to guest address space needs to be isolated and protected.
>  SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow
>  device access guest memory for data transfer and sending MSI/X interrupts.
>  PCI devices generated message signalled interrupt writes are within guest
>  address spaces which are also translated using SMMU.
> 
>  For this reason the GITS (ITS address space) Interrupt Translation Register
>  space is mapped in the guest address space.
> 
>  2.    Basic PCI Support for ARM
>  -----------------------------------------------------------------------------
>  The APIs to read write from PCI configuration space are based on segment:bdf.
>  How the sbdf is mapped to a physical address is under the realm of the PCI
>  host controller.
> 
>  ARM PCI support in Xen, introduces PCI host controller similar to what
>  exists in Linux. Host controller drivers registers callbacks, which are
>  invoked on matching the compatible property in pci device tree node.
> 
>  Note: as pci devices are enumerated the pci node in device tree refers to
>  the host controller.
> 
>  (TODO: for ACPI unimplemented)
> 
>  2.1    pci_hostbridge and pci_hostbridge_ops
>  -----------------------------------------------------------------------------
>  The init function in the PCI host driver calls to register hostbridge
>  callbacks:
> 
>  int pci_hostbridge_register(pci_hostbridge_t *pcihb);
> 
>  struct pci_hostbridge_ops {
>      u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
>                                  u32 reg, u32 bytes);
>      void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
>                                  u32 reg, u32 bytes, u32 val);
>  };
> 
>  struct pci_hostbridge{
>      u32 segno;
>      paddr_t cfg_base;
>      paddr_t cfg_size;
>      struct dt_device_node *dt_node;
>      struct pci_hostbridge_ops ops;
>      struct list_head list;
>  };
> 
>  A PCI conf_read function would internally be as follows:
>  u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
>  {
>      pci_hostbridge_t *pcihb;
>      list_for_each_entry(pcihb, &pci_hostbridge_list, list)
>      {
>          if(pcihb-segno == seg)
>              return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
>      }
>      return -1;
>  }
> 
>  2.2    PHYSDEVOP_pci_host_bridge_add hypercall
>  -----------------------------------------------------------------------------
>  Xen code accesses PCI configuration space based on the sbdf received from
>  the guest. The order in which the pci device tree node appear may not be
>  the same order of device enumeration in dom0. Thus there needs to be a
>  mechanism to bind the segment number assigned by dom0 to the pci host
>  controller. The hypercall is introduced:
> 
>  #define PHYSDEVOP_pci_host_bridge_add    <<>>
>  struct physdev_pci_host_bridge_add {
>      /* IN */
>      uint16_t seg;
>      uint64_t cfg_base;
>      uint64_t cfg_size;
>  };
> 
>  This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
>  hypercall.
> 
>  To understand in detail about the requirement Ian's example is listed below:
> -- Ref: [1]
>  Imagine we have two PCI host bridges, one with CFG space at 0xA0000000 and
>  a second with CFG space at 0xB0000000.
> 
>  Xen discovers these and assigns segment 0=0xA0000000 and segment
>  1=0xB0000000.
> 
>  Dom0 discovers them too but assigns segment 1=0xA0000000 and segment
>  0=0xB0000000 (i.e. the other way).
> 
>  Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
>  the device with BDF behind the root bridge at 0xA0000000. (Perhaps this is
>  the PHYSDEVOP_manage_pci_add_ext call).
> 
>  But Xen thinks it is talking about the device with BDF behind the root
>  bridge at 0xB0000000 because Dom0 and Xen do not agree on what the segments
>  mean. Now Xen will use the wrong device ID in the IOMMU (since that is
>  associated with the host bridge), or poke the wrong configuration space, or
>  whatever.
> 
>  Or maybe Xen chose 42=0xB0000000 and 43=0xA0000000 so when Dom0 starts
>  talking about segment=0 and =1 it has no idea what is going on.
> 
>  PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say "Segment 0
>  is the host bridge at 0xB0000000" and "Segment 1 is the host bridge at
>  0xA0000000". With this there is no confusion between Xen and Dom0 because
>  Xen isn't picking a segment ID, it is being told what it is by Dom0 which
>  has done the picking.
> --
> 
>  The handler code invokes to update segment number in pci_hostbridge:
> 
>  int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
>  cfg_size);
> 
>  Subsequent calls to pci_conf_read/write are completed by the
>  pci_hostbridge_ops of the respective pci_hostbridge.
> 
>  2.3    XEN Internal API
>  -----------------------------------------------------------------------------
>  a) pci_hostbridge_dt_node
> 
>  struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno);
> 
>  Returns the device tree node pointer of the pci node which is bound by the
>  passed segment number. The API can be called subsequent to
>  pci_hostbridge_setup
> 
>  3.    SMMU programming
>  -----------------------------------------------------------------------------
> 
>  3.1.    Additions for PCI Passthrough
>  -----------------------------------------------------------------------------
> 
>  3.1.1 - add_device in iommu_ops is implemented.
>  -----------------------------------------------------------------------------
> 
>  This is called when PHYSDEVOP_pci_add_device / PHYSDEVOP_manage_pci_add_ext
>  is called from dom0.
> 
>  .add_device = arm_smmu_add_dom0_dev,
>  static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
>  {
>          if (dev_is_pci(dev)) {
>              struct pci_dev *pdev = to_pci_dev(dev);
>              return arm_smmu_assign_dev(pdev-domain, devfn, dev);
>          }
>          return -1;
>  }
> 
>  3.1.2 - remove_device in iommu_ops is implemented.
>  -----------------------------------------------------------------------------
>  This is called when PHYSDEVOP_pci_device_remove is called from dom0/domU.
> 
>  .remove_device = arm_smmu_remove_dev.
>  TODO: add implementation details of arm_smmu_remove_dev.
> 
>  3.1.3 dev_get_dev_node is modified for pci devices.
>  -----------------------------------------------------------------------------
>  The function is modified to return the dt_node of the pci hostbridge from
>  the device tree. This is required as non-dt devices need a way to find on
>  which smmu they are attached.
> 
>  static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
>  {
>          struct device_node *dev_node = dev_get_dev_node(dev);
>  ....
> 
>  static struct device_node *dev_get_dev_node(struct device *dev)
>  {
>          if (dev_is_pci(dev)) {
>                  struct pci_dev *pdev = to_pci_dev(dev);
>                  return pci_hostbridge_dt_node(pdev-seg);
>          }
>  ...
> 
> 
>  3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
>  -----------------------------------------------------------------------------
>  For a simpler case all should be equal to BDF. But there are some devices
>  that use the wrong requester ID for DMA transactions. Linux kernel has PCI
>  quirks for these. How the same be implemented in Xen or a diffrent approach
>  has to be taken is TODO here.
> 
>  Till that time, for basic implementation it is assumed that all are equal
>  to BDF.
> 
>  4.    Assignment of PCI device
>  -----------------------------------------------------------------------------
> 
>  4.1    Dom0
>  -----------------------------------------------------------------------------
>  All PCI devices are assigned to dom0 unless hidden by pciback.hide bootargs
>  in dom0.Dom0 enumerates the PCI devices. For each device the MMIO space has
>  to be mapped in the Stage2 translation for dom0. For dom0 Xen maps ranges
>  from device tree pci nodes in stage 2 translation during boot.
> 
>  In the flow of hypercall processing PHYSDEV_pci_add_device
>  its_add_device(machine_sbdf) should be called. This will allocate ITS
>  specific data structures for the device. (Reference [2])
> 
> 
>  4.1.1 Stage 2 Mapping of GITS_ITRANSLATER space (64k)
>  -----------------------------------------------------------------------------
> 
>  GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so
>  that SMMU can translate MSI(x) from the device using the page table of the
>  domain.
> 
>  4.1.1.1 For Dom0
>  -----------------------------------------------------------------------------
>  GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 this
>  mapping is done in the vgic driver. For domU the mapping is done by
>  toolstack.
> 
>  4.1.1.2 For DomU
>  -----------------------------------------------------------------------------
>  For domU, while creating the domain, the toolstack reads the IPA from the
>  macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
>  read from a new hypercall which returns PA of GITS_ITRANSLATER_SPACE.
> 
>  Subsequently toolstack sends a hypercall to create a stage 2 mapping.
> 
>  Hypercall Details: XEN_DOMCTL_get_itranslater_space
> 
>  /* XEN_DOMCTL_get_itranslater_space */
>  struct xen_domctl_get_itranslater_space {
>      /* OUT variables. */
>      uint64_aligned_t start_addr;
>      uint64_aligned_t size;
>  };
> 
>  4.2 DomU
>  -----------------------------------------------------------------------------
> 
>  4.2.1 Mapping BAR regions in guest address space
>  -----------------------------------------------------------------------------
>  When a PCI-EP device is assigned to a domU the toolstack will read the pci
>  configuration space BAR registers. Toolstack allocates a virtual BAR
>  region for each BAR region, from the area reserved in guest address space for
>  mapping BARs referred to as Guest BAR area. This area is defined in
>  public/arch-arm.h
> 
>  /* For 32bit BARs*/
>  #define GUEST_BAR_BASE_32 <<>>
>  #define GUEST_BAR_SIZE_32 <<>>
> 
>  /* For 64bit BARs*/
>  #define GUEST_BAR_BASE_64 <<>>
>  #define GUEST_BAR_SIZE_64 <<>>
> 
>  Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2
>  translation. If a BAR region address is 32b BASE_32 area would be used,
>  otherwise 64b. If a combination of both is required the support is TODO.
> 
>  Toolstack manages these areas and allocate from these area. The allocation
>  and deallocation is done using APIs similar to malloc and free.
> 
>  4.2.2    Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>  ----------------------------------------------------------------------------
>  Toolstack also updates the xenstore information for the device
>  (virtualbar:physical bar).This information is read by xen-pciback and
>  returned to the domU-pcifront driver configuration space reads for BAR.
> 
>  Entries created are as follows:
>  /local/domain/0/backend/pci/1/0
>  vdev-N
>      BDF = ""
>      BAR-0-IPA = ""
>      BAR-0-PA = ""
>      BAR-0-SIZE = ""
>      ...
>      BAR-M-IPA = ""
>      BAR-M-PA = ""
>      BAR-M-SIZE = ""
> 
>  Note: If BAR M SIZE is 0, it is not a valid entry.
> 
>  4.2.3 Hypercall Modification (XEN_DOMCTL_assign_device)
>  ----------------------------------------------------------------------------
>  For machine:sbdf guest:sbdf needs to be generated when a device is assigned
>  to a domU. Currently this is done by xen-pciback. As per discussions [4]
>  on xen-devel the df generation should be done by toolstack rather than
>  the xen-pciback.
> 
>  Since there is only one pci-frontend bus in domU, s:b:d.f is 0:0:d.f
>  It is proposed in this design document that the df generation be done by
>  toolstack and the xenstore keys be created by toolstack.
> 
>  Folowing guest_sbdf generation the domctl to assign the device is invoked.
>  This hypercall is updated to include *guest_sbdf*. Xen ITS driver can store
>  this mapping domID: guest_sbdf: machine_sbdf and can be used later.
> 
>  struct xen_domctl_assign_device {
>     uint32_t dev;   /* XEN_DOMCTL_DEV_* */
>     union {
>         struct {
>             uint32_t machine_sbdf;   /* machine PCI ID of assigned device */
>             uint32_t guest_sbdf;   /* guest PCI ID of assigned device */
>         } pci;
>         struct {
>             uint32_t size; /* Length of the path */
>             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
>         } dt;
>     } u;
>  };
> 
>  In the handler of this hypercall an internal API function
>  its_assign_device(domid, machine_sbdf, guest_sbdf)
>  (Refrence [2])
> 
>  is called and will store the mapping between machine_sbdf:guest_sbdf.
> 
>  5. Change in Linux PCI frontEnd - backend driver for MSI/X programming
>  -----------------------------------------------------------------------------
> 
>  5.1 pci-frontend bus and gicv3-its node binding for domU
>  -----------------------------------------------------------------------------
>  It is assumed that toolstack would generate a gicv3-its node in domU device
>  tree. As of now the ARM PCI passthrough design supports device assignment to
>  the guests which have gicv3-its support. PCI passthrough with a gicv2 guest
>  is not supported.
> 
>  All the devices assigned to domU are enumerated on a PCI frontend bus.
>  On this bus interrupt parent is set as gicv3-its for ARM systems. As the
>  gicv3-its is emulated in xen, all the access by domU driver is trapped.
>  This helps configuration & direct injection of MSI(LPI) into the guest. Thus
>  the frontend-backend communication for MSI is no longer required.
> 
>  Frontend-backend communication is required only for reading PCI configuration
>  space by dom0 on behalf of domU.
> 
>  6.    Glossary
>  -----------------------------------------------------------------------------
>  MSI: Message Signalled Interrupt
>  ITS: Interrupt Translation Service
>  GIC: Generic Interrupt Controller
>  LPI: Locality-specific Peripheral Interrupt
> 
> 
>  7.    References
>  -----------------------------------------------------------------------------
>  [1]. http://osdir.com/ml/general/2015-08/msg15346.html
>  [2]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg01984.html
>  [3]. http://xenbits.xen.org/people/ianc/vits/draftG.html
>  [4]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg05513.html
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.