[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] ARM/PCI passthrough: libxl_pci, sysfs and pciback questions
Hello, all! While working on PCI passthrough on ARM (partial RFC was published by ARM earlier this year) I tried to implement some related changes in the toolstack. One of the obstacles for ARM is PCI backend’s related code presence: ARM is going to fully emulate an ECAM host bridge in Xen, so no PCI backend/frontend pair is going to be used. If my understanding correct the functionality which is implemented by the pciback and toolstack and which is relevant/needed for ARM: 1. pciback is used as a database for assignable PCI devices, e.g. xl pci-assignable-{add|remove|list} manipulates that list. So, whenever the toolstack needs to know which PCI devices can be passed through it reads that from the relevant sysfs entries of the pciback. 2. pciback is used to hold the unbound PCI devices, e.g. when passing through a PCI device it needs to be unbound from the relevant device driver and bound to pciback (strictly speaking it is not required that the device is bound to pciback, but pciback is again used as a database of the passed through PCI devices, so we can re-bind the devices back to their original drivers when guest domain shuts down) 3. toolstack depends on Domain-0 for discovering PCI device resources which are then permitted for the guest domain, e.g MMIO ranges, IRQs. are read from the sysfs 4. toolstack is responsible for resetting PCI devices being passed through via sysfs/reset of the Domain-0’s PCI bus subsystem 5. toolstack is responsible for the devices are passed with all relevant functions, e.g. so for multifunction devices all the functions are passed to a domain and no partial passthrough is done 6. toolstack cares about SR-IOV devices (am I correct here?) I have implemented a really dirty POC for that which I would need to clean up before showing, but before that I would like to get some feedback and advice on how to proceed with the above. I suggest we: 1. Move all pciback related code (which seems to become x86 code only) into a dedicated file, something like tools/libxl/libxl_pci_x86.c 2. Make the functionality now provided by pciback architecture dependent, so tools/libxl/libxl_pci.c delegates actual assignable device list handling to that arch code and uses some sort of “ops”, e.g. arch->ops.get_all_assignable, arch->ops.add_assignable etc. (This can also be done with “#ifdef CONFIG_PCIBACK”, but seems to be not cute). Introduce tools/libxl/libxl_pci_arm.c to provide ARM implementation. 3. ARM only: As we do not have pciback on ARM we need to have some storage for assignable device list: move that into Xen by extending struct pci_dev with “bool assigned” and providing sysctls for manipulating that, e.g. XEN_SYSCTL_pci_device_{set|get}_assigned, XEN_SYSCTL_pci_device_enum_assigned (to enumerate/get the list of assigned/not-assigned PCI devices). Can this also be interesting for x86? At the moment it seems that x86 does rely on pciback presence, so probably this change might not be interesting for x86 world, but may allow stripping pciback functionality a bit and making the code common to both ARM and x86. 4. ARM only: It is not clear how to handle re-binding of the PCI driver on guest shutdown: we need to store the sysfs path of the original driver the device was bound to. Do we also want to store that in struct pci_dev? 5. An alternative route for 3-4 could be to store that data in XenStore, e.g. MMIOs, IRQ, bind sysfs path etc. This would require more code on Xen side to access XenStore and won’t work if MMIOs/IRQs are passed via device tree/ACPI tables by the bootloaders. Another big question is with respect to Domain-0 and PCI bus sysfs use. The existing code for querying PCI device resources/IRQs and resetting those via sysfs of Domain-0 is more than OK if Domain-0 is present and owns PCI HW. But, there are at least two cases when this is not going to work on ARM: Dom0less setups and when there is a hardware domain owning PCI devices. In our case we have a dedicated guest which is a sort of hardware domain (driver domain DomD) which owns all the hardware of the platform, so we are interested in implementing something that fits our design as well: DomD/hardware domain makes it not possible to access the relevant PCI bus sysfs entries from Domain-0 as those live in DomD/hwdom. This is also true for Dom0less setups as there is no entity that can provide the same. For that reason in my POC I have introduced the following: extended struct pci_dev to hold an array of PCI device’s MMIO ranges and IRQ: 1. Provide internal API for accessing the array of MMIO ranges and IRQ. This can be used in both Dom0less and Domain-0 setups to manipulate the relevant data. The actual data can be read from a device tree/ACPI tables if enumeration is done by bootloaders. 2. For Domain-0/DomD setup add PHYSDEVOP_pci_device_set_resources so Domain-0 can set the relevant resources in Xen while enumerating PCI devices. This requires a change to the Linux kernel driver to work (I can provide more details if needed). 3. For the resetting devices we may want to do that functionality on Xen side as well via introducing PHYSDEVOP_pci_device_reset. I can probably implement an RFC series with all the above if we agree on the approach. Comments are more than welcome. Thank you, Oleksandr
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |