|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: ARM/PCI passthrough: libxl_pci, sysfs and pciback questions
Hello, Roger!
On 10/27/20 2:55 PM, Roger Pau Monné wrote:
> On Tue, Oct 27, 2020 at 09:59:05AM +0000, Oleksandr Andrushchenko wrote:
>> Hello, all!
>>
>> While working on PCI passthrough on ARM (partial RFC was published by ARM
>> earlier this year) I tried to implement some related changes in the
>> toolstack.
>> One of the obstacles for ARM is PCI backend’s related code presence: ARM is
>> going to fully emulate an ECAM host bridge in Xen, so no PCI backend/frontend
>> pair is going to be used.
>>
>> If my understanding correct the functionality which is implemented by the
>> pciback and toolstack and which is relevant/needed for ARM:
>>
>> 1. pciback is used as a database for assignable PCI devices, e.g. xl
>> pci-assignable-{add|remove|list} manipulates that list. So, whenever
>> the
>> toolstack needs to know which PCI devices can be passed through it
>> reads
>> that from the relevant sysfs entries of the pciback.
>>
>> 2. pciback is used to hold the unbound PCI devices, e.g. when passing
>> through a
>> PCI device it needs to be unbound from the relevant device driver and
>> bound
>> to pciback (strictly speaking it is not required that the device is
>> bound to
>> pciback, but pciback is again used as a database of the passed through
>> PCI
>> devices, so we can re-bind the devices back to their original drivers
>> when
>> guest domain shuts down)
>>
>> 3. toolstack depends on Domain-0 for discovering PCI device resources
>> which are
>> then permitted for the guest domain, e.g MMIO ranges, IRQs. are read
>> from
>> the sysfs
>>
>> 4. toolstack is responsible for resetting PCI devices being passed
>> through via
>> sysfs/reset of the Domain-0’s PCI bus subsystem
>>
>> 5. toolstack is responsible for the devices are passed with all relevant
>> functions, e.g. so for multifunction devices all the functions are
>> passed to
>> a domain and no partial passthrough is done
>>
>> 6. toolstack cares about SR-IOV devices (am I correct here?)
> I'm not sure I fully understand what this means. Toolstack cares about
> SR-IOV as it cares about other PCI devices, but the SR-IOV
> functionality is managed by the (dom0) kernel.
Yes, you are right. Please ignore #6
>
>>
>> I have implemented a really dirty POC for that which I would need to clean up
>> before showing, but before that I would like to get some feedback and advice
>> on
>> how to proceed with the above. I suggest we:
>>
>> 1. Move all pciback related code (which seems to become x86 code only)
>> into a
>> dedicated file, something like tools/libxl/libxl_pci_x86.c
>>
>> 2. Make the functionality now provided by pciback architecture dependent,
>> so
>> tools/libxl/libxl_pci.c delegates actual assignable device list
>> handling to
>> that arch code and uses some sort of “ops”, e.g.
>> arch->ops.get_all_assignable, arch->ops.add_assignable etc. (This can
>> also
>> be done with “#ifdef CONFIG_PCIBACK”, but seems to be not cute).
>> Introduce
>> tools/libxl/libxl_pci_arm.c to provide ARM implementation.
> To be fair this is arch and OS dependent, since it's currently based
> on sysfs which is Linux specific. So it should really be
> libxl_pci_linux_x86.c or similar.
This is true, but do we really have any other implementation yet?
>
>> 3. ARM only: As we do not have pciback on ARM we need to have some
>> storage for
>> assignable device list: move that into Xen by extending struct pci_dev
>> with
>> “bool assigned” and providing sysctls for manipulating that, e.g.
>> XEN_SYSCTL_pci_device_{set|get}_assigned,
>> XEN_SYSCTL_pci_device_enum_assigned (to enumerate/get the list of
>> assigned/not-assigned PCI devices). Can this also be interesting for
>> x86? At
>> the moment it seems that x86 does rely on pciback presence, so
>> probably this
>> change might not be interesting for x86 world, but may allow stripping
>> pciback functionality a bit and making the code common to both ARM and
>> x86.
> How are you going to perform the device reset then? Will you assign
> the device to dom0 after removing it from the guest so that dom0 can
> perform the reset? You will need to use logic currently present in
> pciback to do so IIRC.
>
> It doesn't seem like a bad approach, but there are more consequences
> than just how assignable devices are listed.
>
> Also Xen doesn't currently know about IOMMU groups, so Xen would have
> to gain this knowledge in order to know the minimal set of PCI devices
> that can be assigned to a guest.
Good point, I'll check the relevant reset code. Thanks
>
>> 4. ARM only: It is not clear how to handle re-binding of the PCI driver on
>> guest shutdown: we need to store the sysfs path of the original driver
>> the
>> device was bound to. Do we also want to store that in struct pci_dev?
> I'm not sure I follow you here. On shutdown the device would be
> handled back to Xen?
Currently it is bound back to the driver which we seized the device from (if
any).
So, probably the same logic should remain?
>
> Most certainly we don't want to store a sysfs (Linux private
> information) inside of a Xen specific struct (pci_dev).
Yeap, this is something I don't like as well
>
>> 5. An alternative route for 3-4 could be to store that data in XenStore,
>> e.g.
>> MMIOs, IRQ, bind sysfs path etc. This would require more code on Xen
>> side to
>> access XenStore and won’t work if MMIOs/IRQs are passed via device
>> tree/ACPI
>> tables by the bootloaders.
> As above, I think I need more context to understand what and why you
> need to save such information.
Well, with pciback absence we loose a "database" which holds all the knowledge
about which devices are assigned, bound etc. So, XenStore *could* be used a such
a database for us. But this looks not elegant.
>
>> Another big question is with respect to Domain-0 and PCI bus sysfs use. The
>> existing code for querying PCI device resources/IRQs and resetting those via
>> sysfs of Domain-0 is more than OK if Domain-0 is present and owns PCI HW.
>> But,
>> there are at least two cases when this is not going to work on ARM: Dom0less
>> setups and when there is a hardware domain owning PCI devices.
>>
>> In our case we have a dedicated guest which is a sort of hardware domain
>> (driver
>> domain DomD) which owns all the hardware of the platform, so we are
>> interested
>> in implementing something that fits our design as well: DomD/hardware domain
>> makes it not possible to access the relevant PCI bus sysfs entries from
>> Domain-0
>> as those live in DomD/hwdom. This is also true for Dom0less setups as there
>> is
>> no entity that can provide the same.
> You need some kind of channel to transfer this information from the
> hardware domain to the toolstack domain. Some kind of protocol over
> libvchan might be an option.
Yes, this way it will all be handled without workarounds
>
>> For that reason in my POC I have introduced the following: extended struct
>> pci_dev to hold an array of PCI device’s MMIO ranges and IRQ:
>>
>> 1. Provide internal API for accessing the array of MMIO ranges and IRQ.
>> This
>> can be used in both Dom0less and Domain-0 setups to manipulate the
>> relevant
>> data. The actual data can be read from a device tree/ACPI tables if
>> enumeration is done by bootloaders.
> I would be against storing this data inside of Xen if Xen doesn't have
> to make any use of it. Does Xen need to know the MMIO ranges and IRQs
> to perform it's task?
>
> If not, then there's no reason to store those in Xen. The hypervisor
> is not the right place to implement a database like mechanism for PCI
> devices.
We have discussed all the above with Roger on IRC (thank you Roger),
so I'll prepare an RFC for ARM PCI passthrough configuration and send it ASAP.
>
> Roger.
Thank you,
Oleksandr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |