[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ARM/PCI passthrough: libxl_pci, sysfs and pciback questions


  • To: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "wl@xxxxxxx" <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Tue, 27 Oct 2020 09:59:05 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wfepSJ8iKy4SzAe8qRCLxQ8t3vxjdNyD8vio/vkrtAY=; b=l09II9CO3l71ISRxvitKrUXa0nt61P+3loN/9OytgvtCm6tag5YD3VcDvKac4XDxjlhxjgonMnEqSxAYx1Yr11eYBBrOxSgSseR6iBvv1r8k4t3cqQv6D3QXNrLXVbUFkoGha2cEVbPjQFSfDVmYxkUaCDIDbWNPe1xjGRKt4wd6c+dqlkI1ImjCIlUKdDdNguEbw58/VgVbYSEOoW08FgMHyUpjr3J5QpXZjXKVebF/tiN1Y8FsrGQ5R4PsU+1CWRB8SmlFjkqgMM9ps2pbcyILh5/uRLV2HGpZKqoO6p0YvY1kdBELJyHyj05pM+aHGFFHTl8ls3wX5iIylHpEkQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XSjCH0CLlx0PRnyH0FkE5cz4kHpseMcJDYgwwhk3I0hF2pb31LDgcvmLk66iajH35DXifPOpozvAPgRM9EVwi74HsGNYmiSYmGrlEDHglrO+3xWLXBXr+ucO0upernZIq4pbSetnA2LRz+O756+Y2NlwyTbMnxAGy1u4hIb4A8qAPlowKZbD9DymtV3nL4KVFw0eIRC5KmYxmAaCO8SwqrqnuKtCcozHXarEbgVnEBhmMB9qqGmovF/aovnfcjPfjSLFutdg3moNqyWHiOSiM1X3IgX7uorXkLmld+9+y2MS9W3dAIqHed5VIspRK5I7w3fI7Qg7k/x8CRIkrY9ZXg==
  • Authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=epam.com;
  • Delivery-date: Tue, 27 Oct 2020 09:59:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWrEfNmyZZUV4zgEum5lDBP9dPxg==
  • Thread-topic: ARM/PCI passthrough: libxl_pci, sysfs and pciback questions

Hello, all!

While working on PCI passthrough on ARM (partial RFC was published by ARM
earlier this year) I tried to implement some related changes in the toolstack.
One of the obstacles for ARM is PCI backend’s related code presence: ARM is
going to fully emulate an ECAM host bridge in Xen, so no PCI backend/frontend
pair is going to be used.

If my understanding correct the functionality which is implemented by the
pciback and toolstack and which is relevant/needed for ARM:

  1. pciback is used as a database for assignable PCI devices, e.g. xl
     pci-assignable-{add|remove|list} manipulates that list. So, whenever the
     toolstack needs to know which PCI devices can be passed through it reads
     that from the relevant sysfs entries of the pciback.

  2. pciback is used to hold the unbound PCI devices, e.g. when passing through 
a
     PCI device it needs to be unbound from the relevant device driver and bound
     to pciback (strictly speaking it is not required that the device is bound 
to
     pciback, but pciback is again used as a database of the passed through PCI
     devices, so we can re-bind the devices back to their original drivers when
     guest domain shuts down)

  3. toolstack depends on Domain-0 for discovering PCI device resources which 
are
     then permitted for the guest domain, e.g MMIO ranges, IRQs. are read from
     the sysfs

  4. toolstack is responsible for resetting PCI devices being passed through via
     sysfs/reset of the Domain-0’s PCI bus subsystem

  5. toolstack is responsible for the devices are passed with all relevant
     functions, e.g. so for multifunction devices all the functions are passed 
to
     a domain and no partial passthrough is done

  6. toolstack cares about SR-IOV devices (am I correct here?)


I have implemented a really dirty POC for that which I would need to clean up
before showing, but before that I would like to get some feedback and advice on
how to proceed with the above. I suggest we:

  1. Move all pciback related code (which seems to become x86 code only) into a
     dedicated file, something like tools/libxl/libxl_pci_x86.c

  2. Make the functionality now provided by pciback architecture dependent, so
     tools/libxl/libxl_pci.c delegates actual assignable device list handling to
     that arch code and uses some sort of “ops”, e.g.
     arch->ops.get_all_assignable, arch->ops.add_assignable etc. (This can also
     be done with “#ifdef CONFIG_PCIBACK”, but seems to be not cute). Introduce
     tools/libxl/libxl_pci_arm.c to provide ARM implementation.

  3. ARM only: As we do not have pciback on ARM we need to have some storage for
     assignable device list: move that into Xen by extending struct pci_dev with
     “bool assigned” and providing sysctls for manipulating that, e.g.
     XEN_SYSCTL_pci_device_{set|get}_assigned,
     XEN_SYSCTL_pci_device_enum_assigned (to enumerate/get the list of
     assigned/not-assigned PCI devices). Can this also be interesting for x86? 
At
     the moment it seems that x86 does rely on pciback presence, so probably 
this
     change might not be interesting for x86 world, but may allow stripping
     pciback functionality a bit and making the code common to both ARM and x86.

  4. ARM only: It is not clear how to handle re-binding of the PCI driver on
     guest shutdown: we need to store the sysfs path of the original driver the
     device was bound to. Do we also want to store that in struct pci_dev?

  5. An alternative route for 3-4 could be to store that data in XenStore, e.g.
     MMIOs, IRQ, bind sysfs path etc. This would require more code on Xen side 
to
     access XenStore and won’t work if MMIOs/IRQs are passed via device 
tree/ACPI
     tables by the bootloaders.


Another big question is with respect to Domain-0 and PCI bus sysfs use. The
existing code for querying PCI device resources/IRQs and resetting those via
sysfs of Domain-0 is more than OK if Domain-0 is present and owns PCI HW. But,
there are at least two cases when this is not going to work on ARM: Dom0less
setups and when there is a hardware domain owning PCI devices.

In our case we have a dedicated guest which is a sort of hardware domain (driver
domain DomD) which owns all the hardware of the platform, so we are interested
in implementing something that fits our design as well: DomD/hardware domain
makes it not possible to access the relevant PCI bus sysfs entries from Domain-0
as those live in DomD/hwdom. This is also true for Dom0less setups as there is
no entity that can provide the same.

For that reason in my POC I have introduced the following: extended struct
pci_dev to hold an array of PCI device’s MMIO ranges and IRQ:

  1. Provide internal API for accessing the array of MMIO ranges and IRQ. This
     can be used in both Dom0less and Domain-0 setups to manipulate the relevant
     data. The actual data can be read from a device tree/ACPI tables if
     enumeration is done by bootloaders.

  2. For Domain-0/DomD setup add PHYSDEVOP_pci_device_set_resources so Domain-0
     can set the relevant resources in Xen while enumerating PCI devices. This
     requires a change to the Linux kernel driver to work (I can provide more
     details if needed).

  3. For the resetting devices we may want to do that functionality on Xen side
     as well via introducing PHYSDEVOP_pci_device_reset.


I can probably implement an RFC series with all the above if we agree on the
approach. Comments are more than welcome.

Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.