Re: [Xen-users] ATI VGA Passthrough / Xen 4.2 / Linux 3.8.10

On Mon, 13 May 2013 09:49:54 -0400, Andrew Bobulsky <rulerof@xxxxxxxxx> wrote:
Top posting :P

Hello Gordan, Casey,

I hope you've had a good weekend.  I got back to my project this
morning; I decided to shove one of my 5850's into my board to see if I
could get it to work...

I've had this Windows DomU running, with GPLPV drivers, for a few
hours now.  Performance is excellent.  I'm using the 5850
passed-through as a PCIe device.  One of my 6990s is also plugged in,
and it's being used by Dom0.  Comically, I've got the better monitor
plugged into my Dom0's card because this 5850 lacks mini displayport

So a dual GPU passthrough didn't work for you, but a single GPU
secondary passthrough, as is most commonly used, works fine? I'm
happy for you. Glad to hear that it's the dualness of the GPU that
was foiling your previous attempts.

I also can't get gfx_passthru=1 to work.  Nothing happens other than
an SDL window claiming to be a Serial console showing up on my Dom0's
screen.  I even have the 5850 set up as my BIOS's primary video card.
Oh well :)

I _think_ that could be because it is trying to pass through the
host's primary GPU as the primary GPU for the domU. Isn't that the
way it is supposed to work?

You could try setting up your X on the secondary GPU, and pass the
primary through with gfx_passthru=1 and see what happens.

Gordan, I'm going to poke through your other email later and see if I
can present some information to help you line up any of your
suspicions.  Given the way things have gone for me---and I've
basically duplicated as much of your and Casey's setups as humanly
possible here---I've got to believe the problem here is ACS, or
something related to it.  I can even reboot this VM and the card just
keeps on working.

What bothers me is that ACS is purely a security feature, not a
functionality feature.

On another note, should we retire this thread soon?  It's getting a
bit long and I don't want to discourage any future googlers, nor get
too off topic :P

We could start a new one, I guess? Or perhaps take it to xen-devel
as if it continues it is likely to get low-level and debug-y.

The main thing that bothers me at the moment is that it _looks_
like my 5520 PCIe bridge
(as in: 5520 PCIe bridge -> NF200 PCIe router -> VGA)

clearly starts reporting uncorrected errors on the PCIe bus when
the GPU passthrough starts to go wrong that the GPU crashes in
the domU and takes the domU down with it.

1) This clearly doesn't happen with bare metal, so there is
something happening with the low-level hypervisor interraction
that seems to be resulting in corrupt data being sent down the
PCIe bus.

2) This doesn't seem to happen in simple 3D applications. For
example, I can run OCCT GPU test full screen or furmark in a window
for hours without any issues, but as soon as I fire up a game
it all goes wrong in very short order. The Quadro case is
horribly intermittent, but the ATI behaves very predictably.
It always tends to crash at exactly the same point, which
leads me to think there is a very specific, very particular thing
the domU tries to do that leads to everything falling apart.

If only I could figure out _what_ that particular something
is, I might actually stand a chance of doing something
about it (hence why I was talking about taking PCIe
capture dumps, but I imagine this is going to be akin to
looking through several GBs of wireshark logs, i.e.
boring, time consuming, labour intensive, and without any
a-priori promise that it will yield any useful findings.


