[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: ARM/PCI passthrough: libxl_pci, sysfs and pciback questions
Hello, Roger! On 10/27/20 2:55 PM, Roger Pau Monné wrote: > On Tue, Oct 27, 2020 at 09:59:05AM +0000, Oleksandr Andrushchenko wrote: >> Hello, all! >> >> While working on PCI passthrough on ARM (partial RFC was published by ARM >> earlier this year) I tried to implement some related changes in the >> toolstack. >> One of the obstacles for ARM is PCI backend’s related code presence: ARM is >> going to fully emulate an ECAM host bridge in Xen, so no PCI backend/frontend >> pair is going to be used. >> >> If my understanding correct the functionality which is implemented by the >> pciback and toolstack and which is relevant/needed for ARM: >> >> 1. pciback is used as a database for assignable PCI devices, e.g. xl >> pci-assignable-{add|remove|list} manipulates that list. So, whenever >> the >> toolstack needs to know which PCI devices can be passed through it >> reads >> that from the relevant sysfs entries of the pciback. >> >> 2. pciback is used to hold the unbound PCI devices, e.g. when passing >> through a >> PCI device it needs to be unbound from the relevant device driver and >> bound >> to pciback (strictly speaking it is not required that the device is >> bound to >> pciback, but pciback is again used as a database of the passed through >> PCI >> devices, so we can re-bind the devices back to their original drivers >> when >> guest domain shuts down) >> >> 3. toolstack depends on Domain-0 for discovering PCI device resources >> which are >> then permitted for the guest domain, e.g MMIO ranges, IRQs. are read >> from >> the sysfs >> >> 4. toolstack is responsible for resetting PCI devices being passed >> through via >> sysfs/reset of the Domain-0’s PCI bus subsystem >> >> 5. toolstack is responsible for the devices are passed with all relevant >> functions, e.g. so for multifunction devices all the functions are >> passed to >> a domain and no partial passthrough is done >> >> 6. toolstack cares about SR-IOV devices (am I correct here?) > I'm not sure I fully understand what this means. Toolstack cares about > SR-IOV as it cares about other PCI devices, but the SR-IOV > functionality is managed by the (dom0) kernel. Yes, you are right. Please ignore #6 > >> >> I have implemented a really dirty POC for that which I would need to clean up >> before showing, but before that I would like to get some feedback and advice >> on >> how to proceed with the above. I suggest we: >> >> 1. Move all pciback related code (which seems to become x86 code only) >> into a >> dedicated file, something like tools/libxl/libxl_pci_x86.c >> >> 2. Make the functionality now provided by pciback architecture dependent, >> so >> tools/libxl/libxl_pci.c delegates actual assignable device list >> handling to >> that arch code and uses some sort of “ops”, e.g. >> arch->ops.get_all_assignable, arch->ops.add_assignable etc. (This can >> also >> be done with “#ifdef CONFIG_PCIBACK”, but seems to be not cute). >> Introduce >> tools/libxl/libxl_pci_arm.c to provide ARM implementation. > To be fair this is arch and OS dependent, since it's currently based > on sysfs which is Linux specific. So it should really be > libxl_pci_linux_x86.c or similar. This is true, but do we really have any other implementation yet? > >> 3. ARM only: As we do not have pciback on ARM we need to have some >> storage for >> assignable device list: move that into Xen by extending struct pci_dev >> with >> “bool assigned” and providing sysctls for manipulating that, e.g. >> XEN_SYSCTL_pci_device_{set|get}_assigned, >> XEN_SYSCTL_pci_device_enum_assigned (to enumerate/get the list of >> assigned/not-assigned PCI devices). Can this also be interesting for >> x86? At >> the moment it seems that x86 does rely on pciback presence, so >> probably this >> change might not be interesting for x86 world, but may allow stripping >> pciback functionality a bit and making the code common to both ARM and >> x86. > How are you going to perform the device reset then? Will you assign > the device to dom0 after removing it from the guest so that dom0 can > perform the reset? You will need to use logic currently present in > pciback to do so IIRC. > > It doesn't seem like a bad approach, but there are more consequences > than just how assignable devices are listed. > > Also Xen doesn't currently know about IOMMU groups, so Xen would have > to gain this knowledge in order to know the minimal set of PCI devices > that can be assigned to a guest. Good point, I'll check the relevant reset code. Thanks > >> 4. ARM only: It is not clear how to handle re-binding of the PCI driver on >> guest shutdown: we need to store the sysfs path of the original driver >> the >> device was bound to. Do we also want to store that in struct pci_dev? > I'm not sure I follow you here. On shutdown the device would be > handled back to Xen? Currently it is bound back to the driver which we seized the device from (if any). So, probably the same logic should remain? > > Most certainly we don't want to store a sysfs (Linux private > information) inside of a Xen specific struct (pci_dev). Yeap, this is something I don't like as well > >> 5. An alternative route for 3-4 could be to store that data in XenStore, >> e.g. >> MMIOs, IRQ, bind sysfs path etc. This would require more code on Xen >> side to >> access XenStore and won’t work if MMIOs/IRQs are passed via device >> tree/ACPI >> tables by the bootloaders. > As above, I think I need more context to understand what and why you > need to save such information. Well, with pciback absence we loose a "database" which holds all the knowledge about which devices are assigned, bound etc. So, XenStore *could* be used a such a database for us. But this looks not elegant. > >> Another big question is with respect to Domain-0 and PCI bus sysfs use. The >> existing code for querying PCI device resources/IRQs and resetting those via >> sysfs of Domain-0 is more than OK if Domain-0 is present and owns PCI HW. >> But, >> there are at least two cases when this is not going to work on ARM: Dom0less >> setups and when there is a hardware domain owning PCI devices. >> >> In our case we have a dedicated guest which is a sort of hardware domain >> (driver >> domain DomD) which owns all the hardware of the platform, so we are >> interested >> in implementing something that fits our design as well: DomD/hardware domain >> makes it not possible to access the relevant PCI bus sysfs entries from >> Domain-0 >> as those live in DomD/hwdom. This is also true for Dom0less setups as there >> is >> no entity that can provide the same. > You need some kind of channel to transfer this information from the > hardware domain to the toolstack domain. Some kind of protocol over > libvchan might be an option. Yes, this way it will all be handled without workarounds > >> For that reason in my POC I have introduced the following: extended struct >> pci_dev to hold an array of PCI device’s MMIO ranges and IRQ: >> >> 1. Provide internal API for accessing the array of MMIO ranges and IRQ. >> This >> can be used in both Dom0less and Domain-0 setups to manipulate the >> relevant >> data. The actual data can be read from a device tree/ACPI tables if >> enumeration is done by bootloaders. > I would be against storing this data inside of Xen if Xen doesn't have > to make any use of it. Does Xen need to know the MMIO ranges and IRQs > to perform it's task? > > If not, then there's no reason to store those in Xen. The hypervisor > is not the right place to implement a database like mechanism for PCI > devices. We have discussed all the above with Roger on IRC (thank you Roger), so I'll prepare an RFC for ARM PCI passthrough configuration and send it ASAP. > > Roger. Thank you, Oleksandr
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |