[Xen-devel] Theorycraft about Radeons VGA Passthrough issues

I sent this E-Mail two days ago to xen-users, but to be honest, I think that developers should be interesed in this as well.


I think I have some interesing info regarding the infamous Radeon's "performance glitch" issue.

I have a Radeon 5770, a Sapphire Flex Edition to be specific, should be this one:

It had by default a highest Power State that runs the GPU @ 850 MHz, VRAM @ 1200 MHz, and the GPU Voltage is 1.125V, if I recall correctly. Because I usually underclock/undervolt absolutely everything in the name of power efficiency, I had modified the Video Card BIOS with Radeon BIOS Editor with custom Power States. You can actually flash it from within a VM - I could flash mine with the ATI WinFlash tools. However, you need a computer reboot for changes to take effect, restarting the VM isn't enough. My current PowerPlay settings are these, along with a modified Fan curve:


The VM where I use this Radeon uses WXP SP3 with the GPLPV Drivers I use Arch Linux as Dom0 with Xen 4.3.1 builded with the ATI VGA Passthrough patch that was included in the Arch Linux User Repository xen package (This patch is also currently provided for Xen 4.4.1, it builds properly with it but its claimed to not have been tested). Syslinux is the Boot Loader, and I have specified in its config file to hide the Radeon PCI address, so Dom0 shouldn't see or initialize it. 

On a "good" VM start, I can run GPU-Z and it will see how my Video Card switchs to different PowerStates (The ones I configured previously) depending on load. That's good.


Sometimes, like when I shut down the VM then open it again, the Radeon gets stuck on a Power State which is NOT a value from the PowerPlay table which I modified. In my Video Card, it is GPU @ 850 MHz with VRAM @ 1200 MHz, which was the highest default Power State on the original BIOS, but it is not present in mine anymore. Also, GPU-Z and other tools fails to report GPU Voltage, which I suppose should be 1.125V. Temperatures on load are also accordingly much higher than what archivable with my highest Power State, so does Fan noise, so I suppose than the Video Card is really running at those values, through I didn't benchmarked them (Should be faster for obvious reasons).


I suppose that when the Video Card fails to be fully initialized properly, instead it falls back on a "backup" Power State, which in my Video Card coincided with the highest one, and that Power State is not part of the regular PowerPlay table. My theory is that when you have the "performance glitchs", it is because other BIOSes may instead have a backup Power State which should be close to what you expect of a power saving mode, while in my case it is totally the opposite.

While some people claims that you need to do a full reboot of the computer to do a reboot of the VM that uses the Radeon, I didn't had such types of issues, is as simple as restarting the VM one or two more times, and as soon as GPU-Z shows 150/300 or 700/800 I know it is good to go. I *DID* needed full computer reboot for VBIOS flashs to take effect, and while experimenting Frequency/Voltage settings, if the Voltage was too low to be fully stable at that Frequency, I couldn't get the GPU to work again without a computer reboot, with the VM always BSODing on boot.
I recall having seen BIOSes in TechPower VGA BIOS Collection whose PowerPlay tables had some weird values, like for example, Frequencies appropiate for Idle with highest Power State GPU Voltage, and more interesing, viceversa, which should be a no-go. Due to the fact that I don't know what the "backup" settings are for when the Video Card doesn't fully initialize properly, nor where they come from and if they are rational values, I suspect that a bad combination of those hidden values could be heavily related to this and is why these people have this issue.

Sometimes I have experienced crashes on the WXP VM that forces me to kill it from Dom0 with xl, and that leaves the screen on the Monitor attached to the Radeon with a frozen screen. The next VM start it displays some weird behaviator, because the Monitor gets refreshed with a white screen, after some time it goes black, then some time after a BSOD on Dom0 VM's screen follows. However, restarting the VM again, it works properly. So for pretty much any non-GPU stability related issue, closing and opening the VM a few times gets the Video Card initilized properly sooner or later. So in the last 4-5 months of usage after finishing my PowerPlay table which proved to be stable, there has been no time where I had to actually reboot the computer to restart the VM in any event (Normal shut down then creating it again with xl, or killing the VM due crash or whatever).

I'm interesng in people that also had issues with VGA Passthrough and Radeons to post GPU-Z screens (Or write the values) when they suffer from performance degradation after VM restarts. I'm inclined to believe than it is entirely related to the vBIOS backup Power State that it uses when it fails to initialize properly.
