[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs



On Tue, Dec 05, 2023 at 02:01:46PM -0500, Stewart Hildebrand wrote:
> On 12/5/23 06:08, Roger Pau Monné wrote:
> > On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
> >> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
> >>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
> >>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> >>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> >>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> >>>>>>          bus = PCI_BUS(machine_sbdf);
> >>>>>>          devfn = PCI_DEVFN(machine_sbdf);
> >>>>>>  
> >>>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
> >>>>>> +        {
> >>>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI 
> >>>>>> support not enabled\n",
> >>>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
> >>>>>> +            ret = -EPERM;
> >>>>>> +            break;
> >>>>>
> >>>>> I think this is likely too restrictive going forward.  The current
> >>>>> approach is indeed to enable vPCI on a per-domain basis because that's
> >>>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
> >>>>>
> >>>>> If we start to expose vPCI suport to guests the interface should be on
> >>>>> a per-device basis, so that vPCI could be enabled for some devices,
> >>>>> while others could still be handled by ioreq servers.
> >>>>>
> >>>>> We might want to add a new flag to xen_domctl_assign_device (used by
> >>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
> >>>>> use vPCI.
> >>>>
> >>>> Actually I don't think this is a good idea. I am all for flexibility but
> >>>> supporting multiple different configurations comes at an extra cost for
> >>>> both maintainers and contributors. I think we should try to reduce the
> >>>> amount of configurations we support rather than increasing them
> >>>> (especially on x86 where we have PV, PVH, HVM).
> >>>
> >>> I think it's perfectly fine to initially require a domain to have all
> >>> its devices either passed through using vPCI or ireqs, but the
> >>> interface IMO should allow for such differentiation in the future.
> >>> That's why I think introducing a domain wide vPCI flag might not be
> >>> the best option going forward.
> >>>
> >>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
> >>> domain wide vPCI flag, iow:
> >>>
> >>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
> >>> {
> >>>     if ( has_arch_pdevs(d) )
> >>>     {
> >>>         printk("All passthrough devices must use the same backend\n");
> >>>         return -EINVAL;
> >>>     }
> >>>
> >>>     /* Set vPCI domain flag */
> >>> }
> >>
> >> That would be fine by me, but maybe we can avoid this change too. I was
> >> imagining that vPCI would be enabled at domain creation, not at runtime.
> >> And that vPCI would be enabled by default for all PVH guests (once we
> >> are past the initial experimental phase.)
> > 
> > Then we don't even need a new CDF flag, and just enable vPCI when
> > IOMMU is enabled?  IOW: we can key the enabling of vPCI to
> > XEN_DOMCTL_CDF_iommu for specific domain types?
> > 
> > Maybe that's not so trivial on x86, as there's no x86 PVH domain type
> > from the hypervisor PoV.
> > 
> >>
> >>> We have already agreed that we want to aim for a setup where ioreqs
> >>> and vPCI could be used for the same domain, but I guess you assumed
> >>> that ioreqs would be used for emulated devices exclusively and vPCI
> >>> for passthrough devices?
> >>
> >> Yes, that's right
> >>
> >>
> >>> Overall if we agree that ioreqs and vPCI should co-exist for a domain,
> >>> I'm not sure there's much reason to limit ioreqs to only handle
> >>> emulated devices, seems like an arbitrary limitation.
> >>
> >> Reply below
> >>
> >>
> >>>> I don't think we should enable IOREQ servers to handle PCI passthrough
> >>>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> >>>> Passthrough can be handled by vPCI just fine. I think this should be a
> >>>> good anti-feature to have (a goal to explicitly not add this feature) to
> >>>> reduce complexity. Unless you see a specific usecase to add support for
> >>>> it?
> >>>
> >>> There are passthrough devices (GPUs) that might require some extra
> >>> mediation on dom0 (like the Intel GVT-g thing) that would force the
> >>> usage of ioreqs to passthrough.
> >>
> >> From an architectural perspective, I think it would be cleaner, simpler
> >> to maintain, and simpler to understand if Xen was the sole owner of the
> >> PCI Root Complex and PCI config space mediation (implemented with vPCI).
> >> IOREQ can be used for emulation and it works very well for that. At
> >> least in my mind, that makes things much simpler.
> > 
> > But IOREQ already has all the code to mediate accesses to the PCI
> > config space, and the interface to register separate servers for
> > different PCI devices.
> > 
> > We would then need to duplicate this internally for vPCI, so that vPCI
> > could forward accesses to IOREQ just for IOREQ to forward to yet a
> > different component?  Seems like a lot of duplication for no benefit.
> > 
> >> I understand there are non-trivial cases, like virtual GPUs with
> >> hardware access, but I don't classify those as passthrough. That's
> >> because there isn't one device that gets fully assigned to the guest.
> >> Instead, there is an emulated device (hence IOREQ) with certain MMIO
> >> regions and interrupts that end up being directly mapped from real
> >> hardware.
> >>
> >> So I think it is natural in those cases to use IOREQ and it is also
> >> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
> >> perspective, I hope it will mostly look as if the device is assigned to
> >> Dom0. Even if it ends up being more complex than that, Rome wasn't
> >> built in one day, and I don't think we should try to solve this problem
> >> on day1 (as long as the interfaces are not stable interfaces).
> > 
> > I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
> > allow for emulators to be implemented in user-space, but at the end
> > it's just an interface that allows forwarding accesses to certain
> > resources (for the case we are speaking about here, PCI config space)
> > to entities that registered as handlers.
> > 
> > vPCI OTOH just deals with a very specific resource (PCI config space)
> > and only allows internal handlers to be registered on a byte
> > granularity.
> > 
> > So your proposal would be to implement a hierarchy like the one on the
> > diagram below:
> > 
> >     ┌────────┐ ┌──────────┐ ┌──────────────────┐
> >     │ Memory │ │ IO Ports │ │ PCI config space │
> >     └───────┬┘ └┬─────────┘ └───┬──────────────┘
> >             │   │               │
> >             │   │           ┌───┴──┐
> >             │   │           │ vPCI │
> >             │   │           └─┬──┬─┘
> >          ┌──┴───┴┐            │  │
> >          │ IOREQ ├────────────┘  │
> >          └────┬──┘               │
> >               │                  │
> >  ┌────────────┴──┐              ┌┴──────────────┐
> >  │ IOREQ servers │              │ vPCI handlers │
> >  └───────────────┘              └───────────────┘
> > 
> > While what I'm proposing would look like:
> > 
> >     ┌────────┐ ┌──────────┐ ┌──────────────────┐
> >     │ Memory │ │ IO Ports │ │ PCI config space │
> >     └────┬───┘ └────┬─────┘ └────────┬─────────┘
> >          │          │                │
> >          └─────┬────┴────┬───────────┘
> >                │  IOREQ  │
> >                └─┬─────┬─┘
> >                  │     │
> >  ┌───────────────┤     └────┬──────┐
> >  │ IOREQ servers │          │ vPCI │
> >  └───────────────┘          └───┬──┘
> >                                 │
> >                             ┌───┴───────────┐
> >                             │ vPCI handlers │
> >                             └───────────────┘
> > 
> > I'm obviously biased, but I think the latter is cleaner, and allows
> > all resources to be arbitrated by the same component (IOREQ).
> > 
> > If the concern is about the IOREQ hypercall interface, it would be
> > fine to introduce an option that limit IOREQs to internal users
> > (vPCI) without supporting external IOREQ servers.
> > 
> > Think of IOREQ as a resource mediator inside of Xen, that just does
> > the PCI address decoding and forwards the access to the interested
> > party, either an external IOREQ server or vPCI.
> > 
> >>
> >>> It's important that the interfaces we introduce are correct IMO,
> >>> because that ends up reflecting on the configuration options that we
> >>> expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
> >>> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
> >>> gets placed there will ultimately influence how the option gets
> >>> exposed in xl/libxl, and the interface there is relevant to keep
> >>> stable for end user sanity.
> >>
> >> I agree with you on the stable interfaces. The important part is not to
> >> introduce changes to stable interfaces that could limit us in the
> >> future. Specifically that includes xl and libxl, we need to be careful
> >> there. But I don't see a single per-domain vPCI enable/disable option as
> >> a problem. Let's say that in the future we have a mediated vGPU
> >> implementation: if it works together with vPCI then the per-domain vPCI
> >> option in libxl will be enabled (either explicitely or by default), if
> >> it doesn't then vPCI will be disabled (either explicitely or by the
> >> newer vGPU options.)
> > 
> > If vPCI is hooked into IOREQ there won't be a need anymore to register
> > the vPCI config space traps, as that would be done by IOREQ, and hence
> > vPCI managed devices could be registered at runtime with IOREQ.  IOW:
> > there won't be a need anymore to signal at domain creation whether
> > vPCI is intended to be used or not.
> > 
> > We would obviously need to enable IOREQ for all domains with IOMMU
> > enabled, as it would be IOREQ that register the PCI config space
> > handlers.
> > 
> >> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
> >> before adding more changes on top of them, not because I don't care
> >> about the mediated GPU problem (we do have something similar at AMD),
> >> but because I worry that if we try to change them now we might not do a
> >> good enough job. I would prefer to wait until we know more about the
> >> actual use case, ideally with code supporting it.
> >>
> >> I think the difference in points of views comes from the fact that I see
> >> vPCI as the default, QEMU only as a limited peripheral emulator (or
> >> mediator for the vGPU case) but not in control. vPCI and QEMU are not
> >> equal in my view. vPCI is in charge and always present if not in very
> >> uncommon setups (even if we decide to hook it inside Xen by using
> >> internal IOREQ interfaces). QEMU might come and go.
> > 
> > Xen needs a single component that mediates accesses to resources,
> > whether that's IOREQ, or something else I don't really care that much.
> 
> I just wanted to share what the "something else" diagram might look like.
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └────┬───┘ └────┬─────┘ └──────────┬───────┘
>          │          │                  │
>          └──┬───────┴───────────────┬──┘
>             │ PCI Resource Mediator │
>             └────┬─────┬────────────┘
>                  │     │
>          ┌───────┤     └────┬──────┐
>          │ IOREQ │          │ vPCI │
>          └────┬──┘          └───┬──┘
>               │                 │
>  ┌────────────┴──┐          ┌───┴───────────┐
>  │ IOREQ servers │          │ vPCI handlers │
>  └───────────────┘          └───────────────┘

It's IMO weird that the PCI resource mediator also controls Memory
and IO ports, since that's not a PCI specific resource.

Isn't your proposed "PCI Resource Mediator" the same as what IOREQ
currently does?

I'm kind of confused as what benefit there is in introducing another
layer that multiplexes guest resource accesses.

> > Having vPCI mediate accesses to the PCI config space, and IOREQ to the
> > memory (and on x86 IO port) space just seems awfully complicated for
> > AFAICT no real benefit.
> > 
> > Also, you seem to confabulate IOREQ with QEMU, while the latter is
> > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
> > inside of Xen, that has the ability to forward such accesses to
> > external emulators using an hypercall interface.
> > 
> >> Now that I am writing this, I realize this is also why I wasn't too
> >> happy with the idea of hooking vPCI using IOREQ. It makes them look as
> >> if they are the same, while I don't they should be considered at the
> >> same level of priority, criticality, safety, integration in the system,
> >> etc.
> > 
> > I feel there are some fears with IOREQ from a safety PoV?  The code
> > that does the resource multiplexing is small, and as said above if
> > there are safety concerns with the hypercall interface it would be
> > fine to limit it's usage to internal handlers only.
> 
> Would it make any sense at all to split the resource multiplexing bits from 
> IOREQ into a new separate PCI resource mediator?

That might be fine, but seems like a lot of work and more complexity
in Xen for AFAICT no real benefit.

I think I would need to better understand the worries with using
IOREQ, but wouldn't it be easier to just limit the current IOREQ
code/interface to suit your needs?  Again without knowing exactly what
are the issues with using IOREQ it's hard to propose solutions.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.