[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ARM/PCI passthrough: libxl_pci, sysfs and pciback questions


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 27 Oct 2020 13:55:14 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8SQer9U+hp0ru/DBwTtroRhttF7NK6NWubEMpGsj8BA=; b=IfdjjMCDO2CBLz2ovB5E1GH0XxjDja0jptdGquj2tPdhK5605SyZLjTWCiSmRbBF4D1cqD/heQ3CULXhXF/Ektu2TccD9cLwz5k+djFhUsxL9T5HHFomRycYlHBYhvTCZxT+JmPlA2WnHg+vxWvb8l881QVUEQDhHAHFJuuwquzV9LUqAqYCVu6hdQMz7HLuZu6OQabPonelp4GcCRVxNQj1iZLh8Uk4ivr+SA6bWzWR64MMZ5o0DulShOTkRkeDx+b9Uf8ObcBt9d08m3h4o3ITnl2tl1pfN35EBc1Y1DXiMsUQ48fME9zHH0XYproU2SkqX119vvDE4duCf+hOFQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XUrGLgvKryT1EAxN0dfP9qnFLQvKNhCamcHFgpbydXIBdqKT1YDvfmfMXpMi2HMIoHnX6WnRHytGUjZ2wdViVnTWPq91QR5VwvkNqfVyqIVDIZplEaBNvRWsB+4HD8EKq2/iJjrLWRV0ITeWPEntXnLFUDGkQ4AvR0UwxvW3YWu75eXLxcPB/44b8MTgehQJo/3xyqSRIObn7BTcnPDR5O9aHIcb+yA9GsDLrW8Q+4uBj9wLqijKiYMZ02/sVo8jLlY2EZLeRgUyt0fa0oskNrRg6B8tez7iEIH5BRFNd5TGy84LS9WyF79SD3VLjp4+uHqRx9LQg8wcOYp9ebIliA==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "wl@xxxxxxx" <wl@xxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>
  • Delivery-date: Tue, 27 Oct 2020 12:55:38 +0000
  • Ironport-sdr: 32gkcY/iNEG3GZUbD/HLoFeKkw4FHN9Sbu6kEQCjR83MsBU7zyjm2xgYA/c3agVklHCNJ2eGH6 NSl79jTFppVsPxa2PBh0DQCnwpirlyW2+K4zcebGHKwrRa73egMzoY+cbOdaGYj1Fld1jDX7vn VeY3vxzO373bVhOVQ5qrGuLtLPeuBcxU6hZvRl3VhjYcGQtDN04rSQDuqB4s9YtHUwwd3SHwzB Fd5TFwIAmwaZb/xFFesbLvb9GFTQdpHB4p3iNM9SgD4VrrRHfP8hDQZ0axFPqoka9ZfM8OkGsh V+A=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Oct 27, 2020 at 09:59:05AM +0000, Oleksandr Andrushchenko wrote:
> Hello, all!
> 
> While working on PCI passthrough on ARM (partial RFC was published by ARM
> earlier this year) I tried to implement some related changes in the toolstack.
> One of the obstacles for ARM is PCI backend’s related code presence: ARM is
> going to fully emulate an ECAM host bridge in Xen, so no PCI backend/frontend
> pair is going to be used.
> 
> If my understanding correct the functionality which is implemented by the
> pciback and toolstack and which is relevant/needed for ARM:
> 
>   1. pciback is used as a database for assignable PCI devices, e.g. xl
>      pci-assignable-{add|remove|list} manipulates that list. So, whenever the
>      toolstack needs to know which PCI devices can be passed through it reads
>      that from the relevant sysfs entries of the pciback.
> 
>   2. pciback is used to hold the unbound PCI devices, e.g. when passing 
> through a
>      PCI device it needs to be unbound from the relevant device driver and 
> bound
>      to pciback (strictly speaking it is not required that the device is 
> bound to
>      pciback, but pciback is again used as a database of the passed through 
> PCI
>      devices, so we can re-bind the devices back to their original drivers 
> when
>      guest domain shuts down)
> 
>   3. toolstack depends on Domain-0 for discovering PCI device resources which 
> are
>      then permitted for the guest domain, e.g MMIO ranges, IRQs. are read from
>      the sysfs
> 
>   4. toolstack is responsible for resetting PCI devices being passed through 
> via
>      sysfs/reset of the Domain-0’s PCI bus subsystem
> 
>   5. toolstack is responsible for the devices are passed with all relevant
>      functions, e.g. so for multifunction devices all the functions are 
> passed to
>      a domain and no partial passthrough is done
> 
>   6. toolstack cares about SR-IOV devices (am I correct here?)

I'm not sure I fully understand what this means. Toolstack cares about
SR-IOV as it cares about other PCI devices, but the SR-IOV
functionality is managed by the (dom0) kernel.

> 
> 
> I have implemented a really dirty POC for that which I would need to clean up
> before showing, but before that I would like to get some feedback and advice 
> on
> how to proceed with the above. I suggest we:
> 
>   1. Move all pciback related code (which seems to become x86 code only) into 
> a
>      dedicated file, something like tools/libxl/libxl_pci_x86.c
> 
>   2. Make the functionality now provided by pciback architecture dependent, so
>      tools/libxl/libxl_pci.c delegates actual assignable device list handling 
> to
>      that arch code and uses some sort of “ops”, e.g.
>      arch->ops.get_all_assignable, arch->ops.add_assignable etc. (This can 
> also
>      be done with “#ifdef CONFIG_PCIBACK”, but seems to be not cute). 
> Introduce
>      tools/libxl/libxl_pci_arm.c to provide ARM implementation.

To be fair this is arch and OS dependent, since it's currently based
on sysfs which is Linux specific. So it should really be
libxl_pci_linux_x86.c or similar.

> 
>   3. ARM only: As we do not have pciback on ARM we need to have some storage 
> for
>      assignable device list: move that into Xen by extending struct pci_dev 
> with
>      “bool assigned” and providing sysctls for manipulating that, e.g.
>      XEN_SYSCTL_pci_device_{set|get}_assigned,
>      XEN_SYSCTL_pci_device_enum_assigned (to enumerate/get the list of
>      assigned/not-assigned PCI devices). Can this also be interesting for 
> x86? At
>      the moment it seems that x86 does rely on pciback presence, so probably 
> this
>      change might not be interesting for x86 world, but may allow stripping
>      pciback functionality a bit and making the code common to both ARM and 
> x86.

How are you going to perform the device reset then? Will you assign
the device to dom0 after removing it from the guest so that dom0 can
perform the reset? You will need to use logic currently present in
pciback to do so IIRC.

It doesn't seem like a bad approach, but there are more consequences
than just how assignable devices are listed.

Also Xen doesn't currently know about IOMMU groups, so Xen would have
to gain this knowledge in order to know the minimal set of PCI devices
that can be assigned to a guest.

> 
>   4. ARM only: It is not clear how to handle re-binding of the PCI driver on
>      guest shutdown: we need to store the sysfs path of the original driver 
> the
>      device was bound to. Do we also want to store that in struct pci_dev?

I'm not sure I follow you here. On shutdown the device would be
handled back to Xen?

Most certainly we don't want to store a sysfs (Linux private
information) inside of a Xen specific struct (pci_dev).

>   5. An alternative route for 3-4 could be to store that data in XenStore, 
> e.g.
>      MMIOs, IRQ, bind sysfs path etc. This would require more code on Xen 
> side to
>      access XenStore and won’t work if MMIOs/IRQs are passed via device 
> tree/ACPI
>      tables by the bootloaders.

As above, I think I need more context to understand what and why you
need to save such information.

> 
> Another big question is with respect to Domain-0 and PCI bus sysfs use. The
> existing code for querying PCI device resources/IRQs and resetting those via
> sysfs of Domain-0 is more than OK if Domain-0 is present and owns PCI HW. But,
> there are at least two cases when this is not going to work on ARM: Dom0less
> setups and when there is a hardware domain owning PCI devices.
> 
> In our case we have a dedicated guest which is a sort of hardware domain 
> (driver
> domain DomD) which owns all the hardware of the platform, so we are interested
> in implementing something that fits our design as well: DomD/hardware domain
> makes it not possible to access the relevant PCI bus sysfs entries from 
> Domain-0
> as those live in DomD/hwdom. This is also true for Dom0less setups as there is
> no entity that can provide the same.

You need some kind of channel to transfer this information from the
hardware domain to the toolstack domain. Some kind of protocol over
libvchan might be an option.

> For that reason in my POC I have introduced the following: extended struct
> pci_dev to hold an array of PCI device’s MMIO ranges and IRQ:
> 
>   1. Provide internal API for accessing the array of MMIO ranges and IRQ. This
>      can be used in both Dom0less and Domain-0 setups to manipulate the 
> relevant
>      data. The actual data can be read from a device tree/ACPI tables if
>      enumeration is done by bootloaders.

I would be against storing this data inside of Xen if Xen doesn't have
to make any use of it. Does Xen need to know the MMIO ranges and IRQs
to perform it's task?

If not, then there's no reason to store those in Xen. The hypervisor
is not the right place to implement a database like mechanism for PCI
devices.

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.