Xen project Mailing List

Re: [PATCH v1 3/6] xen/arm: create dom0less virtio-pci DT node

On Wed, 25 Sep 2024, Julien Grall wrote: > Hi Edgar, > > On 25/09/2024 17:49, Edgar E. Iglesias wrote: > > On Wed, Sep 25, 2024 at 10:44 AM Edgar E. Iglesias <edgar.iglesias@xxxxxxx> > > wrote: > > > > > On Wed, Sep 25, 2024 at 05:38:13PM +0100, Julien Grall wrote: > > > > Hi Edgar, > > > > > > > > On 25/09/2024 17:34, Edgar E. Iglesias wrote: > > > > > On Wed, Sep 25, 2024 at 08:44:41AM +0100, Julien Grall wrote: > > > > > > Hi, > > > > > > On 24/09/2024 17:23, Edgar E. Iglesias wrote: > > > > > > > From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx> > > > > > > > > > > > > > > When virtio-pci is specified in the dom0less domU properties, > > > create a > > > > > > > virtio-pci node in the guest's device tree. Set up an mmio handler > > > with > > > > > > > a register for the guest to poll when the backend has connected > > > > > > > and > > > > > > > virtio-pci bus is ready to be probed. Grant tables may be used by > > > > > > > specifying virtio-pci = "grants";. > > > > > > > > > > > > > > [Edgar: Use GPEX PCI INTX interrupt swizzling (from PCI specs). > > > > > > > Make grants iommu-map cover the entire PCI bus. > > > > > > > Add virtio-pci-ranges to specify memory-map for direct-mapped > > > guests. > > > > > > > Document virtio-pci dom0less fdt bindings.] > > > > > > > Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx> > > > > > > > Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xxxxxxx> > > > > > > > --- > > > > > > > docs/misc/arm/device-tree/booting.txt | 21 +++ > > > > > > > xen/arch/arm/dom0less-build.c | 238 > > > ++++++++++++++++++++++++++ > > > > > > > xen/arch/arm/include/asm/kernel.h | 15 ++ > > > > > > > 3 files changed, 274 insertions(+) > > > > > > > > > > > > > > diff --git a/docs/misc/arm/device-tree/booting.txt > > > b/docs/misc/arm/device-tree/booting.txt > > > > > > > index 3a04f5c57f..82f3bd7026 100644 > > > > > > > --- a/docs/misc/arm/device-tree/booting.txt > > > > > > > +++ b/docs/misc/arm/device-tree/booting.txt > > > > > > > @@ -276,6 +276,27 @@ with the following properties: > > > > > > > passed through. This option is the default if this > > > > > > > property > > > is missing > > > > > > > and the user does not provide the device partial device > > > tree for the domain. > > > > > > > +- virtio-pci > > > > > > > > > > > > Similar question to the other patches, why is this specific to > > > virtio PCI? > > > > > > QEMU (or another device module) is free to emulate whatever it wants > > > behind > > > > > > the PCI hosbtridge. > > > > > > > > > > There's no hard limitatino to only virtio-pci devices it's more of a > > > > > recommendation that PVH guests should not use "emulated" devices but > > > > > there's nothing stopping it. > > > > > > > > Could you provide a bit more details where this requirement is coming > > > from? > > > > For instance, I would expect we would need to do some emulation to boot > > > > Windows on Arm. > > > > > > > > > > I see. I guess it just came from my mental model, I thought part of the > > > philosophy behind PVH was to avoid emulated devices and use > > > paravirualized (virtio or something else) or passthrough whereever > > > possible (except for the basic set of devices needed like vGIC, vuart, > > > MMU). > > > > > > > For example, we would recommend users to use virtio-net in favor of an > > emulated eepro1000 or whatever other NIC models available in QEMU. > > Indeed. I would always recommend user to use virtio-net over eepro1000. > > > But there is no hard requirement nor limitation, a user can connect any > > available PCI device from the QEMU set. > > We need to be clear about what we are exposing to the guest. With this patch > we will describe a PCI hostbridge in Device Tree (well it is an empty region > we hope the Device Model to emulate at some point). But the hypervisor will > not create the device model. Instead, you expect the user/integrator to have > extra script to launch a Device Model (So it may not even be a hostbridge). > > > > > Another thing we're looking to do is to minimize the QEMU build (Kconfig + > > configure flags) to create a small build with only the stuff needed for > > virtio-pci. > > It is nice to have a cut down version of QEMU :). However, Xen doesn't care > about the device model used for the emulation. I have seen some specialized DM > in the wild (and used them while I was working on disaggregating the DM). > > Anyway, while I understand this approach works in tailored environment, I am > not convinced this works for a more general approach. The two options I would > rather consider are: > 1. Allow the device model to receive access for a single PCI device (IOW > hook into vPCI). > 2. Find a way to let the user provide the binding (maybe in a partial > device-tree) + the list of Interrupts/MMIO that would be emulated by QEMU. > > The second approach might be another way to get a second hostbridge in your > use case while giving a bit more flexibility in what can be done (thinking > about disagreggated environment). Thank you for the suggestion on the second option, I think that is close to what we intended. Let me add a few more details. There has been a significant trend toward using virtio for all virtual interfaces in automotive and other industries for several years now. While I'm not entirely sure about Windows, all the operating systems we work with (e.g. Android, RTOSes) are optimizing for virtio interfaces. The expectation is that guests will either access physical devices or virtio devices. I mention this in response to the specialized vs. general approach - virtio is becoming (or has already become) the standard, at least in automotive and embedded sectors. This is why we have introduced the new specialized QEMU machine for virtio only on both ARM and x86. However, you are right that the solution is somewhat dependent on the QEMU emulation provided, meaning it isn't fully generalized and may not work with other device models. Let's see if we can improve this. I agree that a single PCI root complex is the cleanest solution from a Xen perspective. However, aside from the level of effort required, it's also important to consider QEMU integration. The separate root complex integrates very well into QEMU's own view of the world, and that is important too because the more we deviate the more we are at risk of triggering unwanted bugs in QEMU. Bugs that would only show up in a Xen configuration and we would responsible to fix. The two PCI RCs approach is simple because it is low complexity from a QEMU point of view. The trade-off is having the two PCI RCs exposed to the VM instead of one, but in our tests two PCI RCs work well on both ARM and even x86. So I think the two PCI RCs approach is viable. (Also I believe that technically is a single PCI RC with two host bridges.) For the second option, I'll let Edgar investigate but I think that would work, thank you for being flexible. We would still need patches 4-6 from this series. Let's assume we'll proceed with patches 4-6 and, as agreed, skip patches 1-2. Then my first thought would be to rely on ImageBuilder to generate the complete virtio DT node. While I usually like using ImageBuilder, in this case, I lean toward having Xen generate the domU nodes. There are a few reasons for this: the partial DTB is typically used for passthrough and related information, which isn't the case here. Although ImageBuilder can merge multiple partial DTBs, I think it's best not to depend on that more delicate feature, for scenarios where a user wants both a passthrough device and a virtio device. But we could change the DT properties to be more explicitly related to an emulated PCI root complex, which could be provided by any device model and not only QEMU. Also we can avoid saying "virtio" in the property name because although our use-case is virtio, as you wrote there is nothing that ties this to virtio today. So what about the following dom0less device tree properties instead? secondary-emulated-pci-host-bridge = <ecam_address ecam_size memory_address memory_size prefetch_address prefetch_size irq_start, irq_how_many, flags>; one of the special flags could be grants enabled/disabled. This way: - The list of interrupts and MMIOs is explicit - The fact that we are talking about a seconday emulated host bridge is explicit, in the description we can say we expect it to be provided by an ioreq device model. We can call it secondary-ioreq-pci-host-bridge - The Xen-generated domU device tree description is still generic and reusable. Let's say that someone comes up with a different use-case and a different device model but still wants a PCI host bridge, they can reuse this. The DomU DT is standard for a generic PCI host bridge. - We don't need ImageBuilder to generate/edit complex device tree nodes and merge partial device trees

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.