[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AMD GPU problems under Xen



On Tue, Nov 29, 2022 at 10:15 AM Marek Marczykowski-Górecki
<marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Nov 29, 2022 at 09:32:54AM -0500, Alex Deucher wrote:
> > On Mon, Nov 28, 2022 at 8:59 PM Demi Marie Obenour
> > <demi@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Nov 28, 2022 at 11:18:00AM -0500, Alex Deucher wrote:
> > > > On Mon, Nov 28, 2022 at 2:18 AM Demi Marie Obenour
> > > > <demi@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > Dear Christian:
> > > > >
> > > > > What is the status of the AMDGPU work for Xen dom0?  That was 
> > > > > mentioned in
> > > > > https://lore.kernel.org/dri-devel/b2dec9b3-03a7-e7ac-306e-1da024af8982@xxxxxxx/
> > > > > and there have been bug reports to Qubes OS about problems with AMDGPU
> > > > > under Xen (such as 
> > > > > https://github.com/QubesOS/qubes-issues/issues/7648).
> > > >
> > > > I would say it's a work in progress.  It depends what GPU  you have
> > > > and what type of xen setup you are using (PV vs PVH, etc.).
> > >
> > > The current situation is:
> > >
> > > - dom0 is PV.
> > > - VMs with assigned PCI devices are HVM and use a Linux-based stubdomain
> > >   QEMU does not run in dom0.
> > > - Everything else is PVH.
> > >
> > > In the future, I believe the goal is to move away from PV and HVM in
> > > favor of PVH, though HVM support will remain for compatibility with
> > > guests (such as Windows) that need emulated devices.
> > >
> > > > In general, your best bet currently is dGPU add in boards because they
> > > > are largely self contained.
> > >
> > > The main problem is that for the trusted GUI to work, there needs to
> > > be at least one GPU attached to a trusted VM, such as the host or a
> > > dedicated GUI VM.  That VM will typically not be running graphics-
> > > intensive workloads, so the compute power of a dGPU is largely wasted.
> > > SR-IOV support would help with that, but the only GPU vendor with open
> > > source SR-IOV support is Intel and it is still not upstream.  I am also
> > > not certain if the support extends to Arc dGPUs.
> >
> > Can you elaborate on this?  Why wouldn't you just want to pass-through
> > a dGPU to a domU to use directly in the guest?
>
> You can do that, but if that's your only GPU in the system, you'll lose
> graphical interface for other guests.
> But yes, simply pass-through of a dGPU is enough in some setups.
>
> > Are you sure?  I didn't think intel's GVT solution was actually
> > SR-IOV.  I think GVT is just a paravirtualized solution.
>
> Yes, it's a paravirtualized solution, with device emulation done in dom0
> kernel. This, besides being rather unusual approach in Xen world
> (emulators, aka IOREQ servers usually live in userspace) puts rather
> complex piece of code that interacts with untrusted data (instructions
> from guests) in almost the most privileged system component, without
> ability to sandbox it in any way. We consider it too risky for Qubes OS,
> especially since the kernel patches were never accepted upstream and the
> Xen support is not maintained anymore.
>
> The SR-IOV approach Demi is talking about is newer development,
> supported since Adler Lake (technically, IGD in Tiger Lake presents
> SR-IOV capability too, but officially it's supported since ADL). The driver
> for managing it is in the process of upstreaming. Some links here:
> https://github.com/intel/linux-intel-lts/issues/33
> (I have not tried it, yet)
>
> >  That aside,
> > we are working on enabling virtio gpu with our GPUs on xen in addition
> > to domU passthrough.
>
> That's interesting development. Please note, Linux recently (part of
> 6.1) gained support to use grant tables with virtio. This allows having
> backends without full access to guest's memory. The work is done in
> generic way, so a driver using proper APIs (including DMA) should work
> out in such setup out of the box. Please try to not break it :)
>
> > >
> > > > APUs and platforms with integrated dGPUs
> > > > are a bit more complicated as they tend to have more platform
> > > > dependencies like ACPI tables and methods in order for the driver to
> > > > be able to initialize the hardware properly.
> > >
> > > Is Xen dom0/domU support for such GPUs being worked on?  Is there an
> > > estimate as to when the needed support will be available upstream?  This
> > > is mostly directed at Christian and other people who work for hardware
> > > vendors.
> >
> > Yes, there are some minor fixes in the driver required which we'll be
> > sending out soon and we had to add some ACPI tables to the whitelist
> > in xen, but unfortunately the ACPI tables are AMD platform specific so
> > there has been pushback from the xen maintainers on accepting them
> > because they are not an official part of the ACPI spec.
>
> Can the driver work without them? Such dependency, as you noted above,
> make things rather complicated for pass-through (specific ACPI tables
> can probably be made available to the guest, but usually guest wouldn't
> see all the resources they talk about anyway).

Not really for APUs and dGPUs that are integrated into a platform.
For some of them the ACPI tables are just data tables which contain a
copy of the vbios image which the driver needs for enumerating board
and platform specific data like the display connector topology, i2c
buses, clocks and voltages, etc.  However there are ACPI methods
required for other things related to the platform like power budgets,
GPU power, etc.  If you don't set them up correctly the system may not
operate correctly or you may not be able to put the device into its
lowest power state.

Alex



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.