[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] VGA passthrough with Xen 4.3 and xl toolstack - performance degradation resolved?



On 03/15/2014 02:45 PM, H. Sieger wrote:
"Are you 100% certain that xen-pciback grabs the device BEFORE the
radeon FB driver is loaded?"

I'm not a 100% certain. However, I've been using the same (initramfs)
method across different Linux Mint/Xen releases and with different
hardware (my regular Nvidia GPU for domU, as well as the AMD 7770 tested
here).

Rebuilding the initrd may well result in the GPU FB driver being put in the initrd and pre-loaded for the high-res console. Hence why I am saying you need to blacklist the radeon/fglrx driver, extract the initrd, delete the driver from there, and re-pack the initrd. Then apply the trick I described to make sure the GPU you are passing through is assigned to xen-pciback before the FB driver is loaded to ensure it cannot be tainted.

Since the PCI IDs of the GPU are listed by xm
pci-list-assignable-devices (or the xl counterpart) I assume pciback
took control.

xen-pciback will have taken control - "xl pci-assignable-add" will unbind the device from it's current driver and bind the device to itself. But if the ATI GPU was touched by the radeon/fglrx driver it will have been tainted sufficiently for the passthrough to not work.

So if I understand you correctly, you believe that the fglrx or radeon
driver may have initialized the graphics card before that pciback module
was able to grab it.

Correct, that is what I am guessing is happening.

So when the pciback module takes control of the
GPU, it does so in a different (initialized) state, compared to when
pciback grabs the GPU before fglrx or radeon kicks in.

Exactly.

Just for information, I did not install the fglrx driver in my tests but
used the radeon driver. However, on my regular hardware I use the fglrx
driver for AMD 7770 used by dom0, and pass through the Nvidia card to
domU - the nouveau (Nvidia) driver is blacklisted on the kernel command
("nouveau.blacklist=1") and doesn't show in lsmod.

I've never seen that command used - I normally blacklist it in /etc/modprobe.d/ but if it works for you...

The point here is that the nvidia card doesn't get touched by any driver before it is seized by xen-pciback. With ATI GPU passthrough, since your primary GPU is also an ATI, the radeon driver loads and likely claims and initializes both cards.

Checking dmesg on my regular hardware, pciback kicks in before the fglrx
driver is loaded:
[    9.329564] pciback 0000:02:00.0: seizing device
[    9.329570] pciback 0000:02:00.1: seizing device
[    9.329730] xen: registering gsi 44 triggering 0 polarity 1
[    9.329744] xen: --> pirq=44 -> irq=44 (gsi=44)
[    9.329878] pciback 0000:02:00.0: enabling device (0000 -> 0003)
[    9.329891] xen: registering gsi 40 triggering 0 polarity 1
[    9.329892] Already setup the GSI :40
[    9.330050] xen_pciback: backend is passthrough
...
...
[   11.494115] fglrx: module license 'Proprietary. (C) 2002 - ATI
Technologies, Starnberg, GERMANY' taints kernel.

Is the radeon driver also blacklisted, since you are using fglrx?

I didn't check dmesg when I did the tests, so I can't be sure that the
Radeon driver also kicks in after pciback, but it's likely the case.

I still think it is worth blacklisting the fglrx and radeon drivers, and making sure you explicitly assign the passthrough GPU to pciback before the driver loads. That way you at least remove one unknown from the equation.

I believe Fedora has pciback compiled into the kernel? This should make
it easier to attach it to the GPU.

I wasn't even aware Fedora supports Xen - I guess I just assumed it was dropped when EL6 dropped support for it in favour of KVM. I use EL6 with 3rd party Xen packages.

One more test I ran today is this: I added the following line into
/etc/default/grub, followed by update-grub:
GRUB_CMDLINE_LINUX_XEN_REPLACE_DEFAULT="xen-pciback.hide=(02:00.0)(02:00.1)
nouveau.blacklist=1 quiet nomodeset"

I can see it's being used in the dmesg output, but I don't see any real
difference. It looks like the initramfs method is good enough to load
pciback and assign the GPU before a graphics driver gets loaded.

"That surprises the living daylights out of me. What driver version are
you using in domU? I is vaguely possible that the very latest driver has
finally been fixed to do a bus reset before trying to initialize the
card. Or are you using primary passthrough and re-POST the card in domU
using it's BIOS to get it back into a clean state?"

I just downloaded the latest non-beta 64 bit driver (Catalyst Software
Suite) from the AMD website which is 13.12 - see here
<http://support.amd.com/en-us/download/desktop?os=Windows+7+-+64>. Prior
to installing the Catalyst suite I installed the .net 4.5 stuff.
I'm doing secondary passthrough, for some reason I never managed to make
primary passthrough work (even not with the Nvidia card). I did nothing
else but install these two packages (.net and AMD Catalyst).

How do I check if ACS works on my X79 platform? I haven't got a clue.

If you managed to get an Nvidia card to work with passthrough without adjusting anything, then you don't need to worry about it. On my system I had issues with things refusing to do PCI passthrough until I disabled ACS checks.

"Just out of interest, have you tried your Nvidia card with Xen 4.3.x?
Does that work?"

I'm writing this on Linux Mint 16 running a 3.11.0-18-generic kernel on
Xen 4.3.0 (4.3.0-1ubuntu1.3 to be exact), with my AMD 7770 for dom0 and
the Nvidia Quadro 2000 for domU (Windows 7 Pro). I use the xl toolstack
for this and it works nicely, which is why the AMD tests are quite
surprising to me. Before that I used xm with Xen 4.3.0 and it also
worked just fine with the Nvidia card.

Indeed, I used xm initially, but the switch to xl required negligible changes, and since xm is being deprecated I switched to xl to avoid any nasty surprises in the future.

The only issue I saw with the Nvidia card was the error 22 problem with
xm that appeared long ago in Xen 4.1.3.

Indeed, I think I was one of the first few people to spot that regression in XSA-46, but that has long since been fixed.

If you are positively certain you checked everything I mentioned, I am out of further ideas - until around Christmass I was using a HD7970 for one of my VMs. The reboot issue was driving me nuts as did the broken power management when running in a VM (for some reason the driver wasn't gradually adjusting the GPU fan speed, it was always running at something like 25% fan speed until the GPU hit 95C at which point it spun up to 100% and stayed there even when the GPU cooled down. At 100% speed the fan was producing enough vibration that the disks in the machine were starting to report errors, but at 80% it was fine, so the workaround was to hard set it to 80% for gaming and leave it there. Also I was finding that loading GPU-Z crashed the VM.

Eventually it annoyed me enough that I just dropped a modified 780Ti (faux Quadro K6000) into the machine instead, gave the Radeon to somebody who just wanted a bare metal gaming card and have been living happily ever since with a perfectly working Nvidia solution.

Gordan


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.