Xen project Mailing List

Re: [PATCH v10 13/17] vpci: add initial support for virtual PCI bus topology

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Date: Tue, 21 Nov 2023 00:42:51 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NlqiU2KpMUfzMqvO05ohk5EBTPYD7tyvu4/9yTd/ico=; b=RYPi8ZjCe59FWzJin2X1Jj4g60HYzZCLnV69s3+u89AxPoBYcPouAhXqzYLyQOYmMJd0KruN5BKVsB9zSEv2hRYi8QYSjKzwCVznhK19RXAk/oqrNLBJM7xboa9Dzd+hoXf9bPvMXHuQktmU6B8dEB4ZKmbM7KALq1HUW0BVT5alb6Ag0/Br9Dx4XkIr7u7WlBGUtUumYixu4smpPjzn48vETS+cZ9QpnRuxu+KW8DkFvFSB34alGxDr4FJbPDsWxwxOmy/IYTaHWCuHz3Le8o0PBMGd32fA41yYyhkaa9xu7MWynQJTRkNfJElASC+rLqw5T2OL39QAT2znrD8+Zw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=C6IIWs5jVqXEaokbFu53TDZtrcmZD9beYxqNQPZ8iG0cUYMM9wPyKV0hzvb86HOyMzoxNm7t4AT7wkKl8PoYRw9QDDElFS60860bxzTr7BAA/J3dUtmoclLNgEhr6ShM2qoJzYww9jDCTgf+jXiXyBsoELPW4cCz3or/64sym0VCFHyTkQ6LCeNHyCsI+x59s0ADBcDkVgvqkgRdfPNaHykBgpxk7Z/+GKjjPFC4s3W+v5v7O7gaKn2IPH7n/JaGmI0Yu//Z4aew0PBtJHh4nonIcxhK6BXVIMk92je1Wc+kxtvJoe1ehWFzjv9kvthbj2mSSlTv/IvQsFIICu1SOw==

Cc: Julien Grall <julien@xxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 21 Nov 2023 00:43:13 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHZ/Vi+wP3QsyKZ0EartcLXqtnBRLB9UyCAgAB7YgCAAAiUAIAAEHKAgADZXQCAAEyPgIAAGSwAgAAczgCAAAOwAIAALzmAgASxF4A=

Thread-topic: [PATCH v10 13/17] vpci: add initial support for virtual PCI bus topology

Hi Stefano, Stefano Stabellini <sstabellini@xxxxxxxxxx> writes: > On Fri, 17 Nov 2023, Volodymyr Babchuk wrote: >> > On Fri, 17 Nov 2023, Volodymyr Babchuk wrote: >> >> Hi Julien, >> >> >> >> Julien Grall <julien@xxxxxxx> writes: >> >> >> >> > Hi Volodymyr, >> >> > >> >> > On 17/11/2023 14:09, Volodymyr Babchuk wrote: >> >> >> Hi Stefano, >> >> >> Stefano Stabellini <sstabellini@xxxxxxxxxx> writes: >> >> >> >> >> >>> On Fri, 17 Nov 2023, Volodymyr Babchuk wrote: >> >> >>>>> I still think, no matter the BDF allocation scheme, that we should >> >> >>>>> try >> >> >>>>> to avoid as much as possible to have two different PCI Root Complex >> >> >>>>> emulators. Ideally we would have only one PCI Root Complex emulated >> >> >>>>> by >> >> >>>>> Xen. Having 2 PCI Root Complexes both of them emulated by Xen would >> >> >>>>> be >> >> >>>>> tolerable but not ideal. >> >> >>>> >> >> >>>> But what is exactly wrong with this setup? >> >> >>> >> >> >>> [...] >> >> >>> >> >> >>>>> The worst case I would like to avoid is to have >> >> >>>>> two PCI Root Complexes, one emulated by Xen and one emulated by >> >> >>>>> QEMU. >> >> >>>> >> >> >>>> This is how our setup works right now. >> >> >>> >> >> >>> If we have: >> >> >>> - a single PCI Root Complex emulated in Xen >> >> >>> - Xen is safety certified >> >> >>> - individual Virtio devices emulated by QEMU with grants for memory >> >> >>> >> >> >>> We can go very far in terms of being able to use Virtio in safety >> >> >>> use-cases. We might even be able to use Virtio (frontends) in a >> >> >>> SafeOS. >> >> >>> >> >> >>> On the other hand if we put an additional Root Complex in QEMU: >> >> >>> - we pay a price in terms of complexity of the codebase >> >> >>> - we pay a price in terms of resource utilization >> >> >>> - we have one additional problem in terms of using this setup with a >> >> >>> SafeOS (one more device emulated by a non-safe component) >> >> >>> >> >> >>> Having 2 PCI Root Complexes both emulated in Xen is a middle ground >> >> >>> solution because: >> >> >>> - we still pay a price in terms of resource utilization >> >> >>> - the code complexity goes up a bit but hopefully not by much >> >> >>> - there is no impact on safety compared to the ideal scenario >> >> >>> >> >> >>> This is why I wrote that it is tolerable. >> >> >> Ah, I see now. Yes, I am agree with this. Also I want to add some >> >> >> more >> >> >> points: >> >> >> - There is ongoing work on implementing virtio backends as a >> >> >> separate >> >> >> applications, written in Rust. Linaro are doing this part. Right now >> >> >> they are implementing only virtio-mmio, but if they want to provide >> >> >> virtio-pci as well, they will need a mechanism to plug only >> >> >> virtio-pci, without Root Complex. This is argument for using single >> >> >> Root >> >> >> Complex emulated in Xen. >> >> >> - As far as I know (actually, Oleksandr told this to me), QEMU has >> >> >> no >> >> >> mechanism for exposing virtio-pci backends without exposing PCI root >> >> >> complex as well. Architecturally, there should be a PCI bus to which >> >> >> virtio-pci devices are connected. Or we need to make some changes to >> >> >> QEMU internals to be able to create virtio-pci backends that are not >> >> >> connected to any bus. Also, added benefit that PCI Root Complex >> >> >> emulator in QEMU handles legacy PCI interrupts for us. This is >> >> >> argument for separate Root Complex for QEMU. >> >> >> As right now we have only virtio-pci backends provided by QEMU and >> >> >> this >> >> >> setup is already working, I propose to stick to this >> >> >> solution. Especially, taking into account that it does not require any >> >> >> changes to hypervisor code. >> >> > >> >> > I am not against two hostbridge as a temporary solution as long as >> >> > this is not a one way door decision. I am not concerned about the >> >> > hypervisor itself, I am more concerned about the interface exposed by >> >> > the toolstack and QEMU. >> > >> > I agree with this... >> > >> > >> >> > To clarify, I don't particular want to have to maintain the two >> >> > hostbridges solution once we can use a single hostbridge. So we need >> >> > to be able to get rid of it without impacting the interface too much. >> > >> > ...and this >> > >> > >> >> This depends on virtio-pci backends availability. AFAIK, now only one >> >> option is to use QEMU and QEMU provides own host bridge. So if we want >> >> get rid of the second host bridge we need either another virtio-pci >> >> backend or we need to alter QEMU code so it can live without host >> >> bridge. >> >> >> >> As for interfaces, it appears that QEMU case does not require any changes >> >> into hypervisor itself, it just boils down to writing couple of xenstore >> >> entries and spawning QEMU with correct command line arguments. >> > >> > One thing that Stewart wrote in his reply that is important: it doesn't >> > matter if QEMU thinks it is emulating a PCI Root Complex because that's >> > required from QEMU's point of view to emulate an individual PCI device. >> > >> > If we can arrange it so the QEMU PCI Root Complex is not registered >> > against Xen as part of the ioreq interface, then QEMU's emulated PCI >> > Root Complex is going to be left unused. I think that would be great >> > because we still have a clean QEMU-Xen-tools interface and the only >> > downside is some extra unused emulation in QEMU. It would be a >> > fantastic starting point. >> >> I believe, that in this case we need to set manual ioreq handlers, like >> what was done in patch "xen/arm: Intercept vPCI config accesses and >> forward them to emulator", because we need to route ECAM accesses >> either to a virtio-pci backend or to a real PCI device. Also we need >> to tell QEMU to not install own ioreq handles for ECAM space. > > I was imagining that the interface would look like this: QEMU registers > a PCI BDF and Xen automatically starts forwarding to QEMU ECAM > reads/writes requests for the PCI config space of that BDF only. It > would not be the entire ECAM space but only individual PCI conf > reads/writes that the BDF only. > Okay, I see that there is the xendevicemodel_map_pcidev_to_ioreq_server() function and corresponding IOREQ_TYPE_PCI_CONFIG call. Is this what you propose to use to register PCI BDF? I see that xen-hvm-common.c in QEMU is able to handle only standard 256 bytes configuration space, but I hope that it will be easy fix. -- WBR, Volodymyr

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.