[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 09/10] xen/arm: Do not map PCI ECAM and MMIO space to Domain-0's p2m
Hi, Julien! On 13.10.21 19:17, Julien Grall wrote: > Hi Oleksandr, > > On 08/10/2021 06:55, Oleksandr Andrushchenko wrote: >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> >> PCI host bridges are special devices in terms of implementing PCI >> passthrough. According to [1] the current implementation depends on >> Domain-0 to perform the initialization of the relevant PCI host >> bridge hardware and perform PCI device enumeration. In order to >> achieve that one of the required changes is to not map all the memory >> ranges in map_range_to_domain as we traverse the device tree on startup >> and perform some additional checks if the range needs to be mapped to >> Domain-0. >> >> The generic PCI host controller device tree binding says [2]: >> - ranges: As described in IEEE Std 1275-1994, but must provide >> at least a definition of non-prefetchable memory. One >> or both of prefetchable Memory and IO Space may also >> be provided. >> >> - reg : The Configuration Space base address and size, as accessed >> from the parent bus. The base address corresponds to >> the first bus in the "bus-range" property. If no >> "bus-range" is specified, this will be bus 0 (the default). >> >> From the above none of the memory ranges from the "ranges" property > > NIT: The space before "From" looks odd. Will fix > >> needs to be mapped to Domain-0 at startup as MMIO mapping is going to >> be handled dynamically by vPCI as we assign PCI devices, e.g. each >> device assigned to Domain-0/guest will have its MMIOs mapped/unmapped >> as needed by Xen. >> >> The "reg" property covers not only ECAM space, but may also have other >> then the configuration memory ranges described, for example [3]: >> - reg: Should contain rc_dbi, config registers location and length. >> - reg-names: Must include the following entries: >> "rc_dbi": controller configuration registers; >> "config": PCIe configuration space registers. >> >> This patch makes it possible to not map all the ranges from the >> "ranges" property and also ECAM from the "reg". All the rest from the >> "reg" property still needs to be mapped to Domain-0, so the PCI >> host bridge remains functional in Domain-0. >> >> [1] >> https://urldefense.com/v3/__https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00777.html__;!!GF_29dbcQIUBPA!hDUAXuiHAx7fsX6gyMvlen2PB4kTm3vWbVFaTy19HwpFJkSbFXkOSjjp_78xX6VmHCVkcFK39w$ >> [lists[.]xenproject[.]org] >> [2] >> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt__;!!GF_29dbcQIUBPA!hDUAXuiHAx7fsX6gyMvlen2PB4kTm3vWbVFaTy19HwpFJkSbFXkOSjjp_78xX6VmHCWryzEUIg$ >> [kernel[.]org] >> [3] >> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt__;!!GF_29dbcQIUBPA!hDUAXuiHAx7fsX6gyMvlen2PB4kTm3vWbVFaTy19HwpFJkSbFXkOSjjp_78xX6VmHCWuT3i91Q$ >> [kernel[.]org] >> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> --- >> Since v4: >> - update skip_mapping comment >> - add comment why we need to map interrupts to Dom0 >> Since v3: >> - pass struct map_range_data to map_dt_irq_to_domain >> - remove redundant check from map_range_to_domain >> - fix handle_device's .skip_mapping >> Since v2: >> - removed check in map_range_to_domain for PCI_DEV >> and moved it to handle_device, so the code is >> simpler >> - s/map_pci_bridge/skip_mapping >> - extended comment in pci_host_bridge_mappings >> - minor code restructure in construct_dom0 >> - s/.need_p2m_mapping/.need_p2m_hwdom_mapping and related >> callbacks >> - unsigned int i; in pci_host_bridge_mappings >> Since v1: >> - Added better description of why and what needs to be mapped into >> Domain-0's p2m and what doesn't >> - Do not do any mappings for PCI devices while traversing the DT >> - Walk all the bridges and make required mappings in one go >> --- >> xen/arch/arm/domain_build.c | 57 ++++++++++++++++++------------ >> xen/arch/arm/pci/ecam.c | 14 ++++++++ >> xen/arch/arm/pci/pci-host-common.c | 49 +++++++++++++++++++++++++ >> xen/arch/arm/pci/pci-host-zynqmp.c | 1 + >> xen/include/asm-arm/pci.h | 10 ++++++ >> xen/include/asm-arm/setup.h | 13 +++++++ >> 6 files changed, 121 insertions(+), 23 deletions(-) >> >> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c >> index b51176b31bef..0d673b06a2f3 100644 >> --- a/xen/arch/arm/domain_build.c >> +++ b/xen/arch/arm/domain_build.c >> @@ -10,7 +10,6 @@ >> #include <asm/regs.h> >> #include <xen/errno.h> >> #include <xen/err.h> >> -#include <xen/device_tree.h> >> #include <xen/libfdt/libfdt.h> >> #include <xen/guest_access.h> >> #include <xen/iocap.h> >> @@ -51,12 +50,6 @@ static int __init parse_dom0_mem(const char *s) >> } >> custom_param("dom0_mem", parse_dom0_mem); >> -struct map_range_data >> -{ >> - struct domain *d; >> - p2m_type_t p2mt; >> -}; >> - >> /* Override macros from asm/page.h to make them work with mfn_t */ >> #undef virt_to_mfn >> #define virt_to_mfn(va) _mfn(__virt_to_mfn(va)) >> @@ -1663,10 +1656,11 @@ static int __init map_dt_irq_to_domain(const struct >> dt_device_node *dev, >> const struct dt_irq *dt_irq, >> void *data) >> { >> - struct domain *d = data; >> + struct map_range_data *mr_data = data; >> + struct domain *d = mr_data->d; >> unsigned int irq = dt_irq->irq; >> int res; >> - bool need_mapping = !dt_device_for_passthrough(dev); >> + bool need_mapping = !mr_data->skip_mapping; > > Before, the variable 'need_mapping' was helpful to understand what > !dt_device_for_passthrough(). Now it feels odd to read. There is only one > user, so can the local variable be dropped? Ok, I can drop the local variable and use the value directly > >> if ( irq < NR_LOCAL_IRQS ) >> { >> @@ -1690,13 +1684,12 @@ static int __init map_dt_irq_to_domain(const struct >> dt_device_node *dev, >> return 0; >> } >> -static int __init map_range_to_domain(const struct dt_device_node *dev, >> - u64 addr, u64 len, >> - void *data) >> +int __init map_range_to_domain(const struct dt_device_node *dev, >> + u64 addr, u64 len, void *data) >> { >> struct map_range_data *mr_data = data; >> struct domain *d = mr_data->d; >> - bool need_mapping = !dt_device_for_passthrough(dev); >> + bool need_mapping = !mr_data->skip_mapping; > > Same here. Ok, I can drop the local variable and use the value directly > >> int res; >> /* >> @@ -1748,23 +1741,21 @@ static int __init map_range_to_domain(const struct >> dt_device_node *dev, >> * then we may need to perform additional mappings in order to make >> * the child resources available to domain 0. >> */ >> -static int __init map_device_children(struct domain *d, >> - const struct dt_device_node *dev, >> - p2m_type_t p2mt) >> +static int __init map_device_children(const struct dt_device_node *dev, >> + struct map_range_data *mr_data) >> { >> - struct map_range_data mr_data = { .d = d, .p2mt = p2mt }; >> - int ret; >> - >> if ( dt_device_type_is_equal(dev, "pci") ) >> { >> + int ret; >> + >> dt_dprintk("Mapping children of %s to guest\n", >> dt_node_full_name(dev)); >> - ret = dt_for_each_irq_map(dev, &map_dt_irq_to_domain, d); >> + ret = dt_for_each_irq_map(dev, &map_dt_irq_to_domain, mr_data); >> if ( ret < 0 ) >> return ret; >> - ret = dt_for_each_range(dev, &map_range_to_domain, &mr_data); >> + ret = dt_for_each_range(dev, &map_range_to_domain, mr_data); >> if ( ret < 0 ) >> return ret; >> } >> @@ -1845,6 +1836,20 @@ static int __init handle_device(struct domain *d, >> struct dt_device_node *dev, >> int res; >> u64 addr, size; >> bool need_mapping = !dt_device_for_passthrough(dev); > > I find the difference between .skip_mapping and the local variable > need_mapping quite puzzling. We are likely going to misuse them in the future. > > I think it would be clearer if the local variable is renamed to 'own_device' > or similar. Ok, so let it be own_device > >> + /* >> + * For PCI passthrough we only need to remap to Dom0 the interrupts >> + * and memory ranges from "reg" property which cover controller's >> + * configuration registers and such. PCIe configuration space registers >> + * of the PCIe Root Complex and PCIe aperture should not be mapped >> + * automatically to Dom0. >> + */ >> + struct map_range_data mr_data = { >> + .d = d, >> + .p2mt = p2mt, >> + .skip_mapping = !need_mapping || >> + (is_pci_passthrough_enabled() && >> + (device_get_class(dev) == DEVICE_PCI)) > > The device class is confusing. When I see DEVICE_PCI, I think this is a PCI > device whereas here you are referring to the hostbridge. > > Unfortunate, I wasn't able to comment on the original patch before it was > committed. But I would like this to be renamed to DEVICE_PCI_HOSTBRIDGE for > Xen 4.16. Can you send a patch? Ok, I'll add a patch with s/DEVICE_PCI/DEVICE_PCI_HOSTBRIDGE > >> + }; >> naddr = dt_number_of_address(dev); >> @@ -1884,7 +1889,6 @@ static int __init handle_device(struct domain *d, >> struct dt_device_node *dev, >> /* Give permission and map MMIOs */ >> for ( i = 0; i < naddr; i++ ) >> { >> - struct map_range_data mr_data = { .d = d, .p2mt = p2mt }; >> res = dt_device_get_address(dev, i, &addr, &size); >> if ( res ) >> { >> @@ -1898,7 +1902,7 @@ static int __init handle_device(struct domain *d, >> struct dt_device_node *dev, >> return res; >> } >> - res = map_device_children(d, dev, p2mt); >> + res = map_device_children(dev, &mr_data); >> if ( res ) >> return res; >> @@ -3056,7 +3060,14 @@ static int __init construct_dom0(struct domain *d) >> return rc; >> if ( acpi_disabled ) >> + { >> rc = prepare_dtb_hwdom(d, &kinfo); >> + if ( rc < 0 ) >> + return rc; >> +#ifdef CONFIG_HAS_PCI >> + rc = pci_host_bridge_mappings(d, p2m_mmio_direct_c); > > It is not clear to me why you are passing p2m_mmio_direct_c and not p2mt here? There is no p2mt in the caller function, e.g. construct_dom0 > If you really want to force a type, then I think it should be p2m_mmio_direct. I just followed the defaults found in: static int __init prepare_dtb_hwdom(struct domain *d, struct kernel_info *kinfo) { const p2m_type_t default_p2mt = p2m_mmio_direct_c; which is also called from construct_dom0 > > But then why is it a parameter of pci_host_bridge_mappings? Do you expect > someone else to modify it? No, I don't expect to modify that, I just don't want PCI code to make decisions on that. So, I feel more comfortable if that decision is taken in construct_dom0. So, what do we want here? Pass as parameter (then p2m_mmio_direct I guess, not p2m_mmio_direct_c)? Or let PCI code use p2m_mmio_direct inside pci_host_bridge_mappings? > >> +#endif >> + } >> else >> rc = prepare_acpi(d, &kinfo); >> diff --git a/xen/arch/arm/pci/ecam.c b/xen/arch/arm/pci/ecam.c >> index 602d00799c8d..b81620074a91 100644 >> --- a/xen/arch/arm/pci/ecam.c >> +++ b/xen/arch/arm/pci/ecam.c >> @@ -40,6 +40,19 @@ void __iomem *pci_ecam_map_bus(struct pci_host_bridge >> *bridge, >> return base + (PCI_DEVFN2(sbdf.bdf) << devfn_shift) + where; >> } >> +bool pci_ecam_need_p2m_hwdom_mapping(struct domain *d, > > AFAICT, this is only used by boot code. So you want to do add __init. Makes sense > > This is also exported but not used. I would prefer if we only exposed when > the first external user will be introduced. ZynqMP is the first user yet in this patch. More to come probably later on when we add other host bridges. > >> + struct pci_host_bridge *bridge, >> + uint64_t addr) >> +{ >> + struct pci_config_window *cfg = bridge->cfg; >> + >> + /* >> + * We do not want ECAM address space to be mapped in Domain-0's p2m, >> + * so we can trap access to it. >> + */ >> + return cfg->phys_addr != addr; >> +} >> + >> /* ECAM ops */ >> const struct pci_ecam_ops pci_generic_ecam_ops = { >> .bus_shift = 20, >> @@ -47,6 +60,7 @@ const struct pci_ecam_ops pci_generic_ecam_ops = { >> .map_bus = pci_ecam_map_bus, >> .read = pci_generic_config_read, >> .write = pci_generic_config_write, >> + .need_p2m_hwdom_mapping = pci_ecam_need_p2m_hwdom_mapping, >> } >> }; >> diff --git a/xen/arch/arm/pci/pci-host-common.c >> b/xen/arch/arm/pci/pci-host-common.c >> index 1eb4daa87365..085f08e23e0c 100644 >> --- a/xen/arch/arm/pci/pci-host-common.c >> +++ b/xen/arch/arm/pci/pci-host-common.c >> @@ -18,6 +18,7 @@ >> #include <xen/init.h> >> #include <xen/pci.h> >> +#include <asm/setup.h> > > For new code, we usually include xen/*.h first and then asm/*.h. They are > then order alphabetically within themselves. Ok > >> #include <xen/rwlock.h> >> #include <xen/sched.h> >> #include <xen/vmap.h> >> @@ -320,6 +321,54 @@ int pci_host_get_num_bridges(void) >> return count; >> } >> +int __init pci_host_bridge_mappings(struct domain *d, p2m_type_t p2mt) >> +{ >> + struct pci_host_bridge *bridge; >> + struct map_range_data mr_data = { >> + .d = d, >> + .p2mt = p2mt, >> + .skip_mapping = false >> + }; >> + >> + /* >> + * For each PCI host bridge we need to only map those ranges >> + * which are used by Domain-0 to properly initialize the bridge, >> + * e.g. we do not want to map ECAM configuration space which lives in >> + * "reg" or "assigned-addresses" device tree property, but we want to > > AFAIU, "assigned-addresses" is only relevant for the child of nodes with > type="pci". This is not the case for the hostbridges. So I think you want to > drop the mention of "assigned-addresses". Will drop > > >> + * map other regions of the host bridge. The PCI aperture defined by >> + * the "ranges" device tree property should also be skipped. >> + */ >> + list_for_each_entry( bridge, &pci_host_bridges, node ) >> + { >> + const struct dt_device_node *dev = bridge->dt_node; >> + unsigned int i; >> + >> + for ( i = 0; i < dt_number_of_address(dev); i++ ) >> + { >> + uint64_t addr, size; >> + int err; >> + >> + err = dt_device_get_address(dev, i, &addr, &size); >> + if ( err ) >> + { >> + printk(XENLOG_ERR >> + "Unable to retrieve address range index=%u for %s\n", >> + i, dt_node_full_name(dev)); >> + return err; >> + } >> + >> + if ( bridge->ops->need_p2m_hwdom_mapping(d, bridge, addr) ) >> + { >> + err = map_range_to_domain(dev, addr, size, &mr_data); >> + if ( err ) >> + return err; >> + } >> + } >> + } >> + >> + return 0; >> +} >> + >> /* >> * Local variables: >> * mode: C >> diff --git a/xen/arch/arm/pci/pci-host-zynqmp.c >> b/xen/arch/arm/pci/pci-host-zynqmp.c >> index 61a9807d3d58..6ad2b31e810d 100644 >> --- a/xen/arch/arm/pci/pci-host-zynqmp.c >> +++ b/xen/arch/arm/pci/pci-host-zynqmp.c >> @@ -34,6 +34,7 @@ const struct pci_ecam_ops nwl_pcie_ops = { >> .map_bus = pci_ecam_map_bus, >> .read = pci_generic_config_read, >> .write = pci_generic_config_write, >> + .need_p2m_hwdom_mapping = pci_ecam_need_p2m_hwdom_mapping, >> } >> }; >> diff --git a/xen/include/asm-arm/pci.h b/xen/include/asm-arm/pci.h >> index a71b0eea8cb9..b5b85ccd0937 100644 >> --- a/xen/include/asm-arm/pci.h >> +++ b/xen/include/asm-arm/pci.h >> @@ -17,6 +17,8 @@ >> #ifdef CONFIG_HAS_PCI >> +#include <asm/p2m.h> >> + >> #define pci_to_dev(pcidev) (&(pcidev)->arch.dev) >> extern bool pci_passthrough_enabled; >> @@ -73,6 +75,9 @@ struct pci_ops { >> uint32_t reg, uint32_t len, uint32_t *value); >> int (*write)(struct pci_host_bridge *bridge, pci_sbdf_t sbdf, >> uint32_t reg, uint32_t len, uint32_t value); >> + bool (*need_p2m_hwdom_mapping)(struct domain *d, >> + struct pci_host_bridge *bridge, >> + uint64_t addr); >> }; >> /* >> @@ -96,6 +101,9 @@ int pci_generic_config_write(struct pci_host_bridge >> *bridge, pci_sbdf_t sbdf, >> uint32_t reg, uint32_t len, uint32_t value); >> void __iomem *pci_ecam_map_bus(struct pci_host_bridge *bridge, >> pci_sbdf_t sbdf, uint32_t where); >> +bool pci_ecam_need_p2m_hwdom_mapping(struct domain *d, >> + struct pci_host_bridge *bridge, >> + uint64_t addr); >> struct pci_host_bridge *pci_find_host_bridge(uint16_t segment, uint8_t >> bus); >> int pci_get_host_bridge_segment(const struct dt_device_node *node, >> uint16_t *segment); >> @@ -113,6 +121,8 @@ int pci_host_iterate_bridges(struct domain *d, >> struct pci_host_bridge *bridge)); >> int pci_host_get_num_bridges(void); >> +int pci_host_bridge_mappings(struct domain *d, p2m_type_t p2mt); >> + >> #else /*!CONFIG_HAS_PCI*/ >> struct arch_pci_dev { }; >> diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h >> index 95da0b7ab9cd..88d9673db817 100644 >> --- a/xen/include/asm-arm/setup.h >> +++ b/xen/include/asm-arm/setup.h >> @@ -2,6 +2,8 @@ >> #define __ARM_SETUP_H_ >> #include <public/version.h> >> +#include <asm/p2m.h> >> +#include <xen/device_tree.h> >> #define MIN_FDT_ALIGN 8 >> #define MAX_FDT_SIZE SZ_2M >> @@ -77,6 +79,14 @@ struct bootinfo { >> #endif >> }; >> +struct map_range_data >> +{ >> + struct domain *d; >> + p2m_type_t p2mt; >> + /* Set if mapping of the memory ranges must be skipped. */ >> + bool skip_mapping; >> +}; >> + >> extern struct bootinfo bootinfo; >> extern domid_t max_init_domid; >> @@ -124,6 +134,9 @@ void device_tree_get_reg(const __be32 **cell, u32 >> address_cells, >> u32 device_tree_get_u32(const void *fdt, int node, >> const char *prop_name, u32 dflt); >> +int map_range_to_domain(const struct dt_device_node *dev, >> + u64 addr, u64 len, void *data); >> + >> #endif >> /* >> * Local variables: >> >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |