|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] ARM/PCI passthrough: libxl_pci, sysfs and pciback questions
Hello, all!
While working on PCI passthrough on ARM (partial RFC was published by ARM
earlier this year) I tried to implement some related changes in the toolstack.
One of the obstacles for ARM is PCI backend’s related code presence: ARM is
going to fully emulate an ECAM host bridge in Xen, so no PCI backend/frontend
pair is going to be used.
If my understanding correct the functionality which is implemented by the
pciback and toolstack and which is relevant/needed for ARM:
1. pciback is used as a database for assignable PCI devices, e.g. xl
pci-assignable-{add|remove|list} manipulates that list. So, whenever the
toolstack needs to know which PCI devices can be passed through it reads
that from the relevant sysfs entries of the pciback.
2. pciback is used to hold the unbound PCI devices, e.g. when passing through
a
PCI device it needs to be unbound from the relevant device driver and bound
to pciback (strictly speaking it is not required that the device is bound
to
pciback, but pciback is again used as a database of the passed through PCI
devices, so we can re-bind the devices back to their original drivers when
guest domain shuts down)
3. toolstack depends on Domain-0 for discovering PCI device resources which
are
then permitted for the guest domain, e.g MMIO ranges, IRQs. are read from
the sysfs
4. toolstack is responsible for resetting PCI devices being passed through via
sysfs/reset of the Domain-0’s PCI bus subsystem
5. toolstack is responsible for the devices are passed with all relevant
functions, e.g. so for multifunction devices all the functions are passed
to
a domain and no partial passthrough is done
6. toolstack cares about SR-IOV devices (am I correct here?)
I have implemented a really dirty POC for that which I would need to clean up
before showing, but before that I would like to get some feedback and advice on
how to proceed with the above. I suggest we:
1. Move all pciback related code (which seems to become x86 code only) into a
dedicated file, something like tools/libxl/libxl_pci_x86.c
2. Make the functionality now provided by pciback architecture dependent, so
tools/libxl/libxl_pci.c delegates actual assignable device list handling to
that arch code and uses some sort of “ops”, e.g.
arch->ops.get_all_assignable, arch->ops.add_assignable etc. (This can also
be done with “#ifdef CONFIG_PCIBACK”, but seems to be not cute). Introduce
tools/libxl/libxl_pci_arm.c to provide ARM implementation.
3. ARM only: As we do not have pciback on ARM we need to have some storage for
assignable device list: move that into Xen by extending struct pci_dev with
“bool assigned” and providing sysctls for manipulating that, e.g.
XEN_SYSCTL_pci_device_{set|get}_assigned,
XEN_SYSCTL_pci_device_enum_assigned (to enumerate/get the list of
assigned/not-assigned PCI devices). Can this also be interesting for x86?
At
the moment it seems that x86 does rely on pciback presence, so probably
this
change might not be interesting for x86 world, but may allow stripping
pciback functionality a bit and making the code common to both ARM and x86.
4. ARM only: It is not clear how to handle re-binding of the PCI driver on
guest shutdown: we need to store the sysfs path of the original driver the
device was bound to. Do we also want to store that in struct pci_dev?
5. An alternative route for 3-4 could be to store that data in XenStore, e.g.
MMIOs, IRQ, bind sysfs path etc. This would require more code on Xen side
to
access XenStore and won’t work if MMIOs/IRQs are passed via device
tree/ACPI
tables by the bootloaders.
Another big question is with respect to Domain-0 and PCI bus sysfs use. The
existing code for querying PCI device resources/IRQs and resetting those via
sysfs of Domain-0 is more than OK if Domain-0 is present and owns PCI HW. But,
there are at least two cases when this is not going to work on ARM: Dom0less
setups and when there is a hardware domain owning PCI devices.
In our case we have a dedicated guest which is a sort of hardware domain (driver
domain DomD) which owns all the hardware of the platform, so we are interested
in implementing something that fits our design as well: DomD/hardware domain
makes it not possible to access the relevant PCI bus sysfs entries from Domain-0
as those live in DomD/hwdom. This is also true for Dom0less setups as there is
no entity that can provide the same.
For that reason in my POC I have introduced the following: extended struct
pci_dev to hold an array of PCI device’s MMIO ranges and IRQ:
1. Provide internal API for accessing the array of MMIO ranges and IRQ. This
can be used in both Dom0less and Domain-0 setups to manipulate the relevant
data. The actual data can be read from a device tree/ACPI tables if
enumeration is done by bootloaders.
2. For Domain-0/DomD setup add PHYSDEVOP_pci_device_set_resources so Domain-0
can set the relevant resources in Xen while enumerating PCI devices. This
requires a change to the Linux kernel driver to work (I can provide more
details if needed).
3. For the resetting devices we may want to do that functionality on Xen side
as well via introducing PHYSDEVOP_pci_device_reset.
I can probably implement an RFC series with all the above if we agree on the
approach. Comments are more than welcome.
Thank you,
Oleksandr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |