[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PCI Pass-through in Xen ARM: Draft 4
CC'ing a few other x86 people as this is likely to be the same approach that will be taken by PVH. On Thu, 13 Aug 2015, Manish Jaggi wrote: > ----------------------------- > | PCI Pass-through in Xen ARM | > ----------------------------- > manish.jaggi@xxxxxxxxxxxxxxxxxx > ------------------------------- > > Draft-4 > > > ----------------------------------------------------------------------------- > Introduction > ----------------------------------------------------------------------------- > This document describes the design for the PCI passthrough support in Xen > ARM. The target system is an ARM 64bit SoC with GICv3 and SMMU v2 and PCIe > devices. > > ----------------------------------------------------------------------------- > Revision History > ----------------------------------------------------------------------------- > Changes from Draft-1: > --------------------- > a) map_mmio hypercall removed from earlier draft > b) device bar mapping into guest not 1:1 > c) Reserved Area in guest address space for mapping PCI-EP BARs in Stage2. > d) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info). > > Changes from Draft-2: > --------------------- > a) DomU boot information updated with boot-time device assignment and > hotplug. > b) SMMU description added > c) Mapping between streamID - bdf - deviceID. > d) assign_device hypercall to include virtual(guest) sbdf. > Toolstack to generate guest sbdf rather than pciback. > > Changes from Draft-3: > --------------------- > a) Fixed typos and added more description > b) NUMA and PCI passthrough description removed for now. > c) Added example from Ian's Mail > > ----------------------------------------------------------------------------- > Index > ----------------------------------------------------------------------------- > (1) Background > > (2) Basic PCI Support in Xen ARM > (2.1) pci_hostbridge and pci_hostbridge_ops > (2.2) PHYSDEVOP_HOSTBRIDGE_ADD hypercall > (2.3) XEN Internal API > > (3) SMMU programming > (3.1) Additions for PCI Passthrough > (3.2) Mapping between streamID - deviceID - pci sbdf - requesterID > > (4) Assignment of PCI device > (4.1) Dom0 > (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k) > (4.1.1.1) For Dom0 > (4.1.1.2) For DomU > (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space > > (4.2) DomU > (4.2.1) Reserved Areas in guest memory space > (4.2.2) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info). > (4.2.3) Hypercall Modification for bdf mapping notification to xen > > (5) DomU FrontEnd Bus Changes > (5.1) Change in Linux PCI frontend bus and gicv3-its node binding for domU > > (6) Glossary > > (7) References > ----------------------------------------------------------------------------- > > 1. Background of PCI passthrough > ----------------------------------------------------------------------------- > Passthrough refers to assigning a PCI device to a guest domain (domU) such > that the guest has full control over the device. The MMIO space / interrupts > are managed by the guest itself, close to how a bare kernel manages a device. > > Device's access to guest address space needs to be isolated and protected. > SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow > device access guest memory for data transfer and sending MSI/X interrupts. > PCI devices generated message signalled interrupt writes are within guest > address spaces which are also translated using SMMU. > > For this reason the GITS (ITS address space) Interrupt Translation Register > space is mapped in the guest address space. > > 2. Basic PCI Support for ARM > ----------------------------------------------------------------------------- > The APIs to read write from PCI configuration space are based on segment:bdf. > How the sbdf is mapped to a physical address is under the realm of the PCI > host controller. > > ARM PCI support in Xen, introduces PCI host controller similar to what > exists in Linux. Host controller drivers registers callbacks, which are > invoked on matching the compatible property in pci device tree node. > > Note: as pci devices are enumerated the pci node in device tree refers to > the host controller. > > (TODO: for ACPI unimplemented) > > 2.1 pci_hostbridge and pci_hostbridge_ops > ----------------------------------------------------------------------------- > The init function in the PCI host driver calls to register hostbridge > callbacks: > > int pci_hostbridge_register(pci_hostbridge_t *pcihb); > > struct pci_hostbridge_ops { > u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn, > u32 reg, u32 bytes); > void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn, > u32 reg, u32 bytes, u32 val); > }; > > struct pci_hostbridge{ > u32 segno; > paddr_t cfg_base; > paddr_t cfg_size; > struct dt_device_node *dt_node; > struct pci_hostbridge_ops ops; > struct list_head list; > }; > > A PCI conf_read function would internally be as follows: > u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes) > { > pci_hostbridge_t *pcihb; > list_for_each_entry(pcihb, &pci_hostbridge_list, list) > { > if(pcihb-segno == seg) > return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes); > } > return -1; > } > > 2.2 PHYSDEVOP_pci_host_bridge_add hypercall > ----------------------------------------------------------------------------- > Xen code accesses PCI configuration space based on the sbdf received from > the guest. The order in which the pci device tree node appear may not be > the same order of device enumeration in dom0. Thus there needs to be a > mechanism to bind the segment number assigned by dom0 to the pci host > controller. The hypercall is introduced: > > #define PHYSDEVOP_pci_host_bridge_add <<>> > struct physdev_pci_host_bridge_add { > /* IN */ > uint16_t seg; > uint64_t cfg_base; > uint64_t cfg_size; > }; > > This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add > hypercall. > > To understand in detail about the requirement Ian's example is listed below: > -- Ref: [1] > Imagine we have two PCI host bridges, one with CFG space at 0xA0000000 and > a second with CFG space at 0xB0000000. > > Xen discovers these and assigns segment 0=0xA0000000 and segment > 1=0xB0000000. > > Dom0 discovers them too but assigns segment 1=0xA0000000 and segment > 0=0xB0000000 (i.e. the other way). > > Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e. > the device with BDF behind the root bridge at 0xA0000000. (Perhaps this is > the PHYSDEVOP_manage_pci_add_ext call). > > But Xen thinks it is talking about the device with BDF behind the root > bridge at 0xB0000000 because Dom0 and Xen do not agree on what the segments > mean. Now Xen will use the wrong device ID in the IOMMU (since that is > associated with the host bridge), or poke the wrong configuration space, or > whatever. > > Or maybe Xen chose 42=0xB0000000 and 43=0xA0000000 so when Dom0 starts > talking about segment=0 and =1 it has no idea what is going on. > > PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say "Segment 0 > is the host bridge at 0xB0000000" and "Segment 1 is the host bridge at > 0xA0000000". With this there is no confusion between Xen and Dom0 because > Xen isn't picking a segment ID, it is being told what it is by Dom0 which > has done the picking. > -- > > The handler code invokes to update segment number in pci_hostbridge: > > int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t > cfg_size); > > Subsequent calls to pci_conf_read/write are completed by the > pci_hostbridge_ops of the respective pci_hostbridge. > > 2.3 XEN Internal API > ----------------------------------------------------------------------------- > a) pci_hostbridge_dt_node > > struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno); > > Returns the device tree node pointer of the pci node which is bound by the > passed segment number. The API can be called subsequent to > pci_hostbridge_setup > > 3. SMMU programming > ----------------------------------------------------------------------------- > > 3.1. Additions for PCI Passthrough > ----------------------------------------------------------------------------- > > 3.1.1 - add_device in iommu_ops is implemented. > ----------------------------------------------------------------------------- > > This is called when PHYSDEVOP_pci_add_device / PHYSDEVOP_manage_pci_add_ext > is called from dom0. > > .add_device = arm_smmu_add_dom0_dev, > static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev) > { > if (dev_is_pci(dev)) { > struct pci_dev *pdev = to_pci_dev(dev); > return arm_smmu_assign_dev(pdev-domain, devfn, dev); > } > return -1; > } > > 3.1.2 - remove_device in iommu_ops is implemented. > ----------------------------------------------------------------------------- > This is called when PHYSDEVOP_pci_device_remove is called from dom0/domU. > > .remove_device = arm_smmu_remove_dev. > TODO: add implementation details of arm_smmu_remove_dev. > > 3.1.3 dev_get_dev_node is modified for pci devices. > ----------------------------------------------------------------------------- > The function is modified to return the dt_node of the pci hostbridge from > the device tree. This is required as non-dt devices need a way to find on > which smmu they are attached. > > static struct arm_smmu_device *find_smmu_for_device(struct device *dev) > { > struct device_node *dev_node = dev_get_dev_node(dev); > .... > > static struct device_node *dev_get_dev_node(struct device *dev) > { > if (dev_is_pci(dev)) { > struct pci_dev *pdev = to_pci_dev(dev); > return pci_hostbridge_dt_node(pdev-seg); > } > ... > > > 3.2. Mapping between streamID - deviceID - pci sbdf - requesterID > ----------------------------------------------------------------------------- > For a simpler case all should be equal to BDF. But there are some devices > that use the wrong requester ID for DMA transactions. Linux kernel has PCI > quirks for these. How the same be implemented in Xen or a diffrent approach > has to be taken is TODO here. > > Till that time, for basic implementation it is assumed that all are equal > to BDF. > > 4. Assignment of PCI device > ----------------------------------------------------------------------------- > > 4.1 Dom0 > ----------------------------------------------------------------------------- > All PCI devices are assigned to dom0 unless hidden by pciback.hide bootargs > in dom0.Dom0 enumerates the PCI devices. For each device the MMIO space has > to be mapped in the Stage2 translation for dom0. For dom0 Xen maps ranges > from device tree pci nodes in stage 2 translation during boot. > > In the flow of hypercall processing PHYSDEV_pci_add_device > its_add_device(machine_sbdf) should be called. This will allocate ITS > specific data structures for the device. (Reference [2]) > > > 4.1.1 Stage 2 Mapping of GITS_ITRANSLATER space (64k) > ----------------------------------------------------------------------------- > > GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so > that SMMU can translate MSI(x) from the device using the page table of the > domain. > > 4.1.1.1 For Dom0 > ----------------------------------------------------------------------------- > GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 this > mapping is done in the vgic driver. For domU the mapping is done by > toolstack. > > 4.1.1.2 For DomU > ----------------------------------------------------------------------------- > For domU, while creating the domain, the toolstack reads the IPA from the > macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is > read from a new hypercall which returns PA of GITS_ITRANSLATER_SPACE. > > Subsequently toolstack sends a hypercall to create a stage 2 mapping. > > Hypercall Details: XEN_DOMCTL_get_itranslater_space > > /* XEN_DOMCTL_get_itranslater_space */ > struct xen_domctl_get_itranslater_space { > /* OUT variables. */ > uint64_aligned_t start_addr; > uint64_aligned_t size; > }; > > 4.2 DomU > ----------------------------------------------------------------------------- > > 4.2.1 Mapping BAR regions in guest address space > ----------------------------------------------------------------------------- > When a PCI-EP device is assigned to a domU the toolstack will read the pci > configuration space BAR registers. Toolstack allocates a virtual BAR > region for each BAR region, from the area reserved in guest address space for > mapping BARs referred to as Guest BAR area. This area is defined in > public/arch-arm.h > > /* For 32bit BARs*/ > #define GUEST_BAR_BASE_32 <<>> > #define GUEST_BAR_SIZE_32 <<>> > > /* For 64bit BARs*/ > #define GUEST_BAR_BASE_64 <<>> > #define GUEST_BAR_SIZE_64 <<>> > > Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2 > translation. If a BAR region address is 32b BASE_32 area would be used, > otherwise 64b. If a combination of both is required the support is TODO. > > Toolstack manages these areas and allocate from these area. The allocation > and deallocation is done using APIs similar to malloc and free. > > 4.2.2 Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info). > ---------------------------------------------------------------------------- > Toolstack also updates the xenstore information for the device > (virtualbar:physical bar).This information is read by xen-pciback and > returned to the domU-pcifront driver configuration space reads for BAR. > > Entries created are as follows: > /local/domain/0/backend/pci/1/0 > vdev-N > BDF = "" > BAR-0-IPA = "" > BAR-0-PA = "" > BAR-0-SIZE = "" > ... > BAR-M-IPA = "" > BAR-M-PA = "" > BAR-M-SIZE = "" > > Note: If BAR M SIZE is 0, it is not a valid entry. > > 4.2.3 Hypercall Modification (XEN_DOMCTL_assign_device) > ---------------------------------------------------------------------------- > For machine:sbdf guest:sbdf needs to be generated when a device is assigned > to a domU. Currently this is done by xen-pciback. As per discussions [4] > on xen-devel the df generation should be done by toolstack rather than > the xen-pciback. > > Since there is only one pci-frontend bus in domU, s:b:d.f is 0:0:d.f > It is proposed in this design document that the df generation be done by > toolstack and the xenstore keys be created by toolstack. > > Folowing guest_sbdf generation the domctl to assign the device is invoked. > This hypercall is updated to include *guest_sbdf*. Xen ITS driver can store > this mapping domID: guest_sbdf: machine_sbdf and can be used later. > > struct xen_domctl_assign_device { > uint32_t dev; /* XEN_DOMCTL_DEV_* */ > union { > struct { > uint32_t machine_sbdf; /* machine PCI ID of assigned device */ > uint32_t guest_sbdf; /* guest PCI ID of assigned device */ > } pci; > struct { > uint32_t size; /* Length of the path */ > XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */ > } dt; > } u; > }; > > In the handler of this hypercall an internal API function > its_assign_device(domid, machine_sbdf, guest_sbdf) > (Refrence [2]) > > is called and will store the mapping between machine_sbdf:guest_sbdf. > > 5. Change in Linux PCI frontEnd - backend driver for MSI/X programming > ----------------------------------------------------------------------------- > > 5.1 pci-frontend bus and gicv3-its node binding for domU > ----------------------------------------------------------------------------- > It is assumed that toolstack would generate a gicv3-its node in domU device > tree. As of now the ARM PCI passthrough design supports device assignment to > the guests which have gicv3-its support. PCI passthrough with a gicv2 guest > is not supported. > > All the devices assigned to domU are enumerated on a PCI frontend bus. > On this bus interrupt parent is set as gicv3-its for ARM systems. As the > gicv3-its is emulated in xen, all the access by domU driver is trapped. > This helps configuration & direct injection of MSI(LPI) into the guest. Thus > the frontend-backend communication for MSI is no longer required. > > Frontend-backend communication is required only for reading PCI configuration > space by dom0 on behalf of domU. > > 6. Glossary > ----------------------------------------------------------------------------- > MSI: Message Signalled Interrupt > ITS: Interrupt Translation Service > GIC: Generic Interrupt Controller > LPI: Locality-specific Peripheral Interrupt > > > 7. References > ----------------------------------------------------------------------------- > [1]. http://osdir.com/ml/general/2015-08/msg15346.html > [2]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg01984.html > [3]. http://xenbits.xen.org/people/ianc/vits/draftG.html > [4]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg05513.html > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |