[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v10 13/17] vpci: add initial support for virtual PCI bus topology
Hi Stefano, On 16/11/2023 23:28, Stefano Stabellini wrote: On Thu, 16 Nov 2023, Julien Grall wrote:IIUC, this means that Xen will allocate the BDF. I think this will become a problem quite quickly as some of the PCI may need to be assigned at a specific vBDF (I have the intel graphic card in mind). Also, xl allows you to specificy the slot (e.g. <bdf>@<vslot>) which would not work with this approach. For dom0less passthrough, I feel the virtual BDF should always be specified in device-tree. When a domain is created after boot, then I think you want to support <bdf>@<vslot> where <vslot> is optional.Hi Julien, I also think there should be a way to specify the virtual BDF, but if possible (meaning: it is not super difficult to implement) I think it would be very convenient if we could let Xen pick whatever virtual BDF Xen wants when the user doesn't specify the virtual BDF. That's because it would make it easier to specify the configuration for the user. Typically the user doesn't care about the virtual BDF, only to expose a specific host device to the VM. There are exceptions of course and that's why I think we should also have a way for the user to request a specific virtual BDF. One of these exceptions are integrated GPUs: the OS drivers used to have hardcoded BDFs. So it wouldn't work if the device shows up at a different virtual BDF compared to the host. If you let Xen allocating the vBDF, then wouldn't you need a way to tell the toolstack/Device Models which vBDF was allocated? Thinking more about this, one way to simplify the problem would be if we always reuse the physical BDF as virtual BDF for passthrough devices. I think that would solve the problem and makes it much more unlikely to run into drivers bugs. This works so long you have only one physical segment (i.e. hostbridge). If you have multiple one, then you either have to expose multiple hostbridge to the guest (which is not great) or need someone to allocate the vBDF. Hmmm... Wouldn't this means reserving ECAM space for 256 buses? Obviously, we could use 5 (just as random number). Yet, it still requires to reserve more memory than necessary.And we allocate a "special" virtual BDF space for emulated devices, with the Root Complex still emulated in Xen. For instance, we could reserve ff:xx:xx. > and in case of clashes we could refuse to continue. Urgh. And what would be the solution users triggering this clash? Or we could allocate the first free virtual BDF, after all the pasthrough devices. This is only works if you don't want to support PCI hotplug. It may not be a thing for embedded, but it is used by cloud. So you need a mechanism that works with hotplug as well. Example: - the user wants to assign physical 00:11.5 and b3:00.1 to the guest - Xen create virtual BDFs 00:11.5 and b3:00.1 for the passthrough devices - Xen allocates the next virtual BDF for emulated devices: b4:xx.x - If more virtual BDFs are needed for emulated devices, Xen allocates b5:xx.x > I still think, no matter the BDF allocation scheme, that we should try to avoid as much as possible to have two different PCI Root Complex emulators. Ideally we would have only one PCI Root Complex emulated by Xen. Having 2 PCI Root Complexes both of them emulated by Xen would be tolerable but not ideal. The worst case I would like to avoid is to have two PCI Root Complexes, one emulated by Xen and one emulated by QEMU. So while I agree that one emulated hostbridge is the best solution, I don't think your proposal would work. As I wrote above, you may have a system with multiple physical hostbridge. It would not be possible to assign two PCI devices with the same BDF but from different segment. I agree unlikely, but if we can avoid it then it would be best. There are one scheme which fits that: 1. If the vBDF is not specified, then pick a free one. 2. Otherwise check if the specified vBDF is free. If not return an error.This scheme should be used for both virtual and physical. This is pretty much the algorithm used by QEMU today. It works, so what's would be the benefits to do something different? Cheers, -- Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |