[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [early RFC] ARM PCI Passthrough design document
Hi Roger, On 25/01/17 11:42, Roger Pau Monné wrote: On Tue, Jan 24, 2017 at 05:17:06PM +0000, Julien Grall wrote:On 06/01/17 15:12, Roger Pau Monné wrote:On Thu, Dec 29, 2016 at 02:04:15PM +0000, Julien Grall wrote:* Add a device * Remove a device * Assign a device to a guest * Deassign a device from a guest XXX: Detail the interaction when assigning/deassigning deviceAssigning a device will probably entangle setting up some direct MMIO mappings (BARs and ROMs) plus a bunch of traps in order to perform emulation of accesses to the PCI config space (or those can be setup when a new bridge is registered with Xen).I am planning to details the root complex emulation in a separate section. I sent the design document before writing it. In brief, I would expect the registration of a new bridge to setup the trap to emulation access to the PCI configuration space. On ARM, the first approach will rely on the OS to setup the BARs and ROMs. So they will be mapped by the PCI configuration space emulation. The reason on relying on the OS to setup the BARs/ROMs reducing the work to do for a first version. Otherwise we would have to add code in the toolstack to decide where to place the BARs/ROMs. I don't think it is a lot of work, but it is not that important because it does not require a stable ABI (this is an interaction between the hypervisor and the toolstack). Furthermore, Linux (at least on ARM) is assigning the BARs at the setup. From my understanding, this is the expected behavior with both DT (the DT has a property to skip the scan) and ACPI.This approach might work for Dom0, but for DomU you certainly need to know where the MMIO regions of a device are, and either the toolstack or Xen needs to setup this in advance (or at least mark which MMIO regions are available to the DomU). Allowing a DomU to map random MMIO regions is certainly a security issue. I agree here. I provided more feedback on an answer to Stefano, I would your input there to if possible. See <8ca91073-09e7-57ca-9063-b47e0aced39d@xxxxxxxxxx> [...] Based on what Linux is currently doing, there are two kind of quirks: * Accesses to the configuration space of certain sizes are not allowed * A specific driver is necessary for driving the host bridgeHm, so what are the issues that make this bridges need specific drivers? This might be quite problematic if you also have to emulate this broken behavior inside of Xen (because Dom0 is using a specific driver).I am not expecting to emulate the configuration space access for DOM0. I know you mentioned that it would be necessary to hide PCI used by Xen (such as the UART) to DOM0 or configuring MSI. But for ARM, the UART is integrated in the SOC and MSI will be configured through the interrupt controller.Right, we certainly need to do it for x86, but I don't know that much of the ARM architecture in order to know if that's needed or not. I'm also wondering if having both Xen and the Dom0 directly accessing the ECAM area is fine, even if they use different cache mapping attributes? I don't know much x86, but on ARM we could specify caching attributes in the stage-2 page tables (aka EPT on x86). The MMU will use the stricter memory attributes between stage-2 and the guest page tables. In the case of ECAM, we could disable the caching in stage-2 page tables. So the ECAM will always access uncached. So Xen needs to rely on DOM0 to discover the host bridges and notify Xen with all the relevant informations. This will be done via a new hypercall PHYSDEVOP_pci_host_bridge_add. The layout of the structure will be: struct physdev_pci_host_bridge_add { /* IN */ uint16_t seg; /* Range of bus supported by the host bridge */ uint8_t bus_start; uint8_t bus_nr; uint32_t res0; /* Padding */ /* Information about the configuration space region */ uint64_t cfg_base; uint64_t cfg_size; }Why do you need to cfg_size attribute? Isn't it always going to be 4096 bytes in size?The cfg_size is here to help us to match the corresponding node in the device tree. The cfg_size may differ depending on how the hardware has implemented the access to the configuration space.But certainly cfg_base needs to be aligned to a PAGE_SIZE? And according to the spec cfg_size cannot be bigger than 4KB (PAGE_SIZE), so in any case you will end up mapping a whole 4KB page, because that's the minimum granularity of the p2m? cfg_size would be a multiple of 4KB as each configuration space would have a unique region. But as you mentioned later we could re-use MMCFG_reserved. But to be fair, I think we can deal without this property. For ACPI, the size will vary following the number of bus handled and can be deduced. For DT, the base address and bus range should be enough to find the associated node.If that field is removed you could use the PHYSDEVOP_pci_mmcfg_reserved hypercalls.DOM0 will issue the hypercall PHYSDEVOP_pci_host_bridge_add for each host bridge available on the platform. When Xen is receiving the hypercall, the the driver associated to the host bridge will be instantiated. XXX: Shall we limit DOM0 the access to the configuration space from that moment?Most definitely yes, you should instantiate an emulated bridge over the real one, in order to proxy Dom0 accesses to the PCI configuration space. You for example don't want Dom0 moving the position of the BARs of PCI devices without Xen being aware (and properly changing the second stage translation).The problem is on ARM we don't have a single way to access the configuration space. So we would need different emulator in Xen, which I don't like unless there is a strong reason to do it. We could avoid DOM0s to modify the position of the BARs after setup. I also remembered you mention about MSI configuration, for ARM this is done via the interrupt controller.## Discovering and register PCI Similarly to x86, PCI devices will be discovered by DOM0 and register using the hypercalls PHYSDEVOP_pci_add_device or PHYSDEVOP_manage_pci_add_ext.Why do you need this? If you have access to the bridges you can scan them from Xen and discover the devices AFAICT.I am a bit confused. Are you saying that you plan to ditch them for PVH? If so, why are they called by Linux today?I think we can get away with PHYSDEVOP_pci_mmcfg_reserved only, but maybe I'm missing something. AFAICT Xen should be able to gather all the other data by itself from the PCI config space once it knows the details about the host bridge. From my understanding, some host bridges would need to be configured before been able to be used (TBC). Bringing this initialization in Xen may be complex. For instance the xgene hostbridge (see linux/drivers/pci/host/pci-xgene.c) will require to enable the clock. I would let the initialization of the hostbridge in Linux, so we are doing the scanning in Xen we will need an hypercall to let them knows the host bridges has been initialized. I gave a bit more background on my answer to Stefano. So I would recommend to continue the conversation there. By default all the PCI devices will be assigned to DOM0. So Xen would have to configure the SMMU and Interrupt Controller to allow DOM0 to use the PCI devices. As mentioned earlier, those subsystems will require the StreamID and DeviceID. Both can be deduced from the RID. XXX: How to hide PCI devices from DOM0?By adding the ACPI namespace of the device to the STAO and blocking Dom0 access to this device in the emulated bridge that Dom0 will have access to (returning 0xFFFF when Dom0 tries to read the vendor ID from the PCI header).Sorry I was not clear here. By hiding, I meant DOM0 not instantiating a driver (similarly to xen-pciback.hide). We still want DOM0 to access the PCI config space in order to reset the device. Unless you plan to import all the reset quirks in Xen?I don't have a clear opinion here, and I don't know all thew details of this reset hacks. Actually I looked at the Linux code (see __pci_dev_reset in drivers/pci/pci.c) and there are less quirks than I expected. The list of quirks can be found in pci_dev_reset_methods in drivers/pci/quirks.c. There are few way to reset a device (see __pci_dev_reset), they look all based on accessing the configuration space. So I guess it should be fine to import that in Xen. Any opinions? Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |