[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ARM/PCI passthrough: libxl_pci, sysfs and pciback questions

  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Tue, 27 Oct 2020 15:52:42 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ypum0ZP6pul9yjowOAW9sosm/3UXA3wT5wkWTOQsbTE=; b=d1eQUqUKbkkYgsaoTJYSKzKPuPw6FxBLq24suY+lg0mU7MQNvvgOg3BdGHnoCwNqmLx2wYTyrL6b6EZc3NRXNXM2S+yGGBMRd7Z26Gz+O4sReez6/YVTp5j31mgNEMFdz5uy8eMp8nWhNzJGx4RAYv5Aid5hKpjBn4SDgx8Xat6/onhgGDUbq7bbXeCzaTn4pqGvHDpf29q9zESJkcm+45akefhb2BoGrEkH4HSVmPSxRH9cQHsawpwMI3lzlT1rWbH7x+r+t0TjKJMSNAqR0Wui15zEfBQaByymUBHBfTj/aH7JHeY9St++XxFuJf2A813/DpSSw9zyGzCFw1IY8w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dvSiumjoBJjPUyl96pA4106ckW/Y6kVwHdr03vQeCgc382u2pZSoZwCC2z/0vAgDeRvpJlbTx51VW0W1pzlmimNruAFWOnQVh7mineuFarONWwdMzpdYxghSW1QNOHsE9QI629nTmB6ZedjeGydUoFLepldZaX+p0SXvYY97wyhSSnApXBGRSkpBrru55j9x8jMFiee9iTKOLd8Gm31SCqb/2uvmFyc9svfpFyf8nEXDe0Uqni2JAKeq9M8enr0lQtK31+MUC5zcdFzqMS3ix3od4tTTBGYRHXkkcFRyE/TYxJpxemcq7iKKouxN4n+6cAsqJ/2EqbMh7PGA2MIHiA==
  • Authentication-results: citrix.com; dkim=none (message not signed) header.d=none;citrix.com; dmarc=none action=none header.from=epam.com;
  • Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "wl@xxxxxxx" <wl@xxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>
  • Delivery-date: Tue, 27 Oct 2020 15:53:08 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWrEfNmyZZUV4zgEum5lDBP9dPxqmraEMAgAAxlIA=
  • Thread-topic: ARM/PCI passthrough: libxl_pci, sysfs and pciback questions

Hello, Roger!

On 10/27/20 2:55 PM, Roger Pau Monné wrote:
> On Tue, Oct 27, 2020 at 09:59:05AM +0000, Oleksandr Andrushchenko wrote:
>> Hello, all!
>> While working on PCI passthrough on ARM (partial RFC was published by ARM
>> earlier this year) I tried to implement some related changes in the 
>> toolstack.
>> One of the obstacles for ARM is PCI backend’s related code presence: ARM is
>> going to fully emulate an ECAM host bridge in Xen, so no PCI backend/frontend
>> pair is going to be used.
>> If my understanding correct the functionality which is implemented by the
>> pciback and toolstack and which is relevant/needed for ARM:
>>    1. pciback is used as a database for assignable PCI devices, e.g. xl
>>       pci-assignable-{add|remove|list} manipulates that list. So, whenever 
>> the
>>       toolstack needs to know which PCI devices can be passed through it 
>> reads
>>       that from the relevant sysfs entries of the pciback.
>>    2. pciback is used to hold the unbound PCI devices, e.g. when passing 
>> through a
>>       PCI device it needs to be unbound from the relevant device driver and 
>> bound
>>       to pciback (strictly speaking it is not required that the device is 
>> bound to
>>       pciback, but pciback is again used as a database of the passed through 
>> PCI
>>       devices, so we can re-bind the devices back to their original drivers 
>> when
>>       guest domain shuts down)
>>    3. toolstack depends on Domain-0 for discovering PCI device resources 
>> which are
>>       then permitted for the guest domain, e.g MMIO ranges, IRQs. are read 
>> from
>>       the sysfs
>>    4. toolstack is responsible for resetting PCI devices being passed 
>> through via
>>       sysfs/reset of the Domain-0’s PCI bus subsystem
>>    5. toolstack is responsible for the devices are passed with all relevant
>>       functions, e.g. so for multifunction devices all the functions are 
>> passed to
>>       a domain and no partial passthrough is done
>>    6. toolstack cares about SR-IOV devices (am I correct here?)
> I'm not sure I fully understand what this means. Toolstack cares about
> SR-IOV as it cares about other PCI devices, but the SR-IOV
> functionality is managed by the (dom0) kernel.
Yes, you are right. Please ignore #6
>> I have implemented a really dirty POC for that which I would need to clean up
>> before showing, but before that I would like to get some feedback and advice 
>> on
>> how to proceed with the above. I suggest we:
>>    1. Move all pciback related code (which seems to become x86 code only) 
>> into a
>>       dedicated file, something like tools/libxl/libxl_pci_x86.c
>>    2. Make the functionality now provided by pciback architecture dependent, 
>> so
>>       tools/libxl/libxl_pci.c delegates actual assignable device list 
>> handling to
>>       that arch code and uses some sort of “ops”, e.g.
>>       arch->ops.get_all_assignable, arch->ops.add_assignable etc. (This can 
>> also
>>       be done with “#ifdef CONFIG_PCIBACK”, but seems to be not cute). 
>> Introduce
>>       tools/libxl/libxl_pci_arm.c to provide ARM implementation.
> To be fair this is arch and OS dependent, since it's currently based
> on sysfs which is Linux specific. So it should really be
> libxl_pci_linux_x86.c or similar.
This is true, but do we really have any other implementation yet?
>>    3. ARM only: As we do not have pciback on ARM we need to have some 
>> storage for
>>       assignable device list: move that into Xen by extending struct pci_dev 
>> with
>>       “bool assigned” and providing sysctls for manipulating that, e.g.
>>       XEN_SYSCTL_pci_device_{set|get}_assigned,
>>       XEN_SYSCTL_pci_device_enum_assigned (to enumerate/get the list of
>>       assigned/not-assigned PCI devices). Can this also be interesting for 
>> x86? At
>>       the moment it seems that x86 does rely on pciback presence, so 
>> probably this
>>       change might not be interesting for x86 world, but may allow stripping
>>       pciback functionality a bit and making the code common to both ARM and 
>> x86.
> How are you going to perform the device reset then? Will you assign
> the device to dom0 after removing it from the guest so that dom0 can
> perform the reset? You will need to use logic currently present in
> pciback to do so IIRC.
> It doesn't seem like a bad approach, but there are more consequences
> than just how assignable devices are listed.
> Also Xen doesn't currently know about IOMMU groups, so Xen would have
> to gain this knowledge in order to know the minimal set of PCI devices
> that can be assigned to a guest.
Good point, I'll check the relevant reset code. Thanks
>>    4. ARM only: It is not clear how to handle re-binding of the PCI driver on
>>       guest shutdown: we need to store the sysfs path of the original driver 
>> the
>>       device was bound to. Do we also want to store that in struct pci_dev?
> I'm not sure I follow you here. On shutdown the device would be
> handled back to Xen?

Currently it is bound back to the driver which we seized the device from (if 

So, probably the same logic should remain?

> Most certainly we don't want to store a sysfs (Linux private
> information) inside of a Xen specific struct (pci_dev).
Yeap, this is something I don't like as well
>>    5. An alternative route for 3-4 could be to store that data in XenStore, 
>> e.g.
>>       MMIOs, IRQ, bind sysfs path etc. This would require more code on Xen 
>> side to
>>       access XenStore and won’t work if MMIOs/IRQs are passed via device 
>> tree/ACPI
>>       tables by the bootloaders.
> As above, I think I need more context to understand what and why you
> need to save such information.

Well, with pciback absence we loose a "database" which holds all the knowledge

about which devices are assigned, bound etc. So, XenStore *could* be used a such

a database for us. But this looks not elegant.

>> Another big question is with respect to Domain-0 and PCI bus sysfs use. The
>> existing code for querying PCI device resources/IRQs and resetting those via
>> sysfs of Domain-0 is more than OK if Domain-0 is present and owns PCI HW. 
>> But,
>> there are at least two cases when this is not going to work on ARM: Dom0less
>> setups and when there is a hardware domain owning PCI devices.
>> In our case we have a dedicated guest which is a sort of hardware domain 
>> (driver
>> domain DomD) which owns all the hardware of the platform, so we are 
>> interested
>> in implementing something that fits our design as well: DomD/hardware domain
>> makes it not possible to access the relevant PCI bus sysfs entries from 
>> Domain-0
>> as those live in DomD/hwdom. This is also true for Dom0less setups as there 
>> is
>> no entity that can provide the same.
> You need some kind of channel to transfer this information from the
> hardware domain to the toolstack domain. Some kind of protocol over
> libvchan might be an option.
Yes, this way it will all be handled without workarounds
>> For that reason in my POC I have introduced the following: extended struct
>> pci_dev to hold an array of PCI device’s MMIO ranges and IRQ:
>>    1. Provide internal API for accessing the array of MMIO ranges and IRQ. 
>> This
>>       can be used in both Dom0less and Domain-0 setups to manipulate the 
>> relevant
>>       data. The actual data can be read from a device tree/ACPI tables if
>>       enumeration is done by bootloaders.
> I would be against storing this data inside of Xen if Xen doesn't have
> to make any use of it. Does Xen need to know the MMIO ranges and IRQs
> to perform it's task?
> If not, then there's no reason to store those in Xen. The hypervisor
> is not the right place to implement a database like mechanism for PCI
> devices.

We have discussed all the above with Roger on IRC (thank you Roger),

so I'll prepare an RFC for ARM PCI passthrough configuration and send it ASAP.

> Roger.

Thank you,




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.