[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-users] ATI VGA Passthrough / Xen 4.2 / Linux 3.8.10
- To: xen-users@xxxxxxxxxxxxx
- From: Gordan Bobic <gordan@xxxxxxxxxx>
- Date: Fri, 10 May 2013 23:39:35 +0100
- Delivery-date: Fri, 10 May 2013 22:40:49 +0000
- List-id: Xen user discussion <xen-users.lists.xen.org>
On 05/10/2013 09:19 PM, Andrew Bobulsky wrote:
2) I actually have it working - for 5 minutes or so at a
time. If
the problem was the lack of ACS, it wouldn't work at all.
I just can't help but wonder if it /is/ the problem, though.
It's the
only thing I can pin down that our situations have in common as
far as
its being the only "non-compatible" portion of the
implementation, aside
from the nearly identical behavior, of course. Maybe the AMD
driver does
some stupid stuff that ACS can mitigate? I just wish I knew more :(
Now you got me thinking... I noticed that when the GPU starts to
head toward the crash, this appears in the syslog:
May 6 16:35:51 normandy kernel: pcieport 0000:00:03.0: AER:
Multiple Uncorrected (Non-Fatal) error received: id=0000
It certainly makes me wonder.
Has anyone else seen this error?
The device ID in question is:
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 3 (rev 22)
which does not bode well...
Duff hardware?
Hmmm... I'll poke through my syslog at the next crash. I tried:
cat /var/log/syslog | grep pcieport
cat /var/log/syslog.1 | grep pcieport
dmesg | grep pcieport
Nothing came back from any of those. I'll see if I can identify any
unique errors myself though!
Worth paying attention to. :)
So what might intrigue you the most here is that while
I'm stuck
with
a VGA device sitting behind this non-ACS compliant
switch... My
results are almost identical to yours. Passing one of
the VGA
devices
to the DomU, with or without the corresponding HDMI audio
doesn't seem
to matter, I get this:
" it is so intermittent. It works well enough to boot
up and
work with
a gaming type load for a few minutes. Then something
happens that
causes the VGA card to require a reset, and it all
falls apart."
Seriously :P
And you are convinced this is to do with the availability
of ACS?
Like I said, it's the only thing that I can pinpoint as being a
hindrance to compatibility. I guess my request here is if
anyone can
help me determine whether or not that's true?
What motherboard are you using? Has anyone successfully used it for
VGA passthrough? I don't think the possibility of both of us having
similarly duff hardware has been systematically excluded yet.
I think I said it, but I'll link here anyway:
http://www.gigabyte.us/products/product-page.aspx?pid=2957#ov
Indeed, you did. Apologies, it's been a long week. :p
As to whether or not anyone's used it for passthrough before... I've got
no clue. Probably not too many people, seeing as how I'm essentially
running a custom BIOS :P
BIOSes are getting so crap (except maybe on Asus boards) these days that
I'm amazed anything works at all. You wouldn't believe the amount of
BIOS buggyness people are encountering on the SR2, and that's now an EOL
product that should by now have had most of it's bugs fixed (yeah - right).
It eventually likes to BSOD, usually on atikmpag.sys I
think.
Plenty
of "an attempt was made to reset the display adapter
and failed"
blah
blah blah.
Yes, all too familiar.
This happens 100% of the time if I try to boot with both
devices attached.
Both devices?
Yes---that is to say both of the VGA controllers from the 6990. The
relevant portion of my lspci looks like this:
http://pastebin.com/raw.php?i=__GwekPNAW
<http://pastebin.com/raw.php?i=GwekPNAW>
OK, I get it. I seem to remember reading in the archives that dual
VGA passthrough is problematic (my experience over the years shows
that multiple GPUs are a false economy of highly questionably benefit).
That's actually pretty much completely accurate. It drives me
particularly up the wall because I hate running things in full screen,
and crossfire basically doesn't work at all without that :P
I like my full screen gaming - but throw something obscure like an IBM
T221 into the mix and things start to get rather non-trivial. T221 is
3840x2400 which is too much for DL-DVI to drive. But it's a 10+ year old
monitor design and it actually takes 3xSL-DVI (but there's an adapter
available that makes it drivable using 2xDL-DVI instead).
Then you have to stitch the screens together (workable with 2xDL-DVI on
XP, you need a Quadro or an Eyefinity card for the driver features to do
it on Vista and 7). What I've found back when my old 4870X2 was bleeding
edge was that with dual monitors attached, the 2nd GPU never did
anything at all (stayed stone cold, performance unaffected by Crossfire).
Since then I've learned my lesson - buy the biggest single GPU you can
afford - it's as good as it's going to get. Everything else is going to
be hit-and-miss. Debugging other people's products may be fun when
you're 14, but I'm two decades too old to not have something better to
do with my time. Nowdays I appreciate things that "just work" - the
unfortunate thing I'm finding, however, is that there tend to be no
things that "just work" that include all the features that I want -
which in turn leads to endless debugging of other people's software to
get it to do what I want, because apparently, nobody else has tried it
before. :-/
Note: devices 09 and 0a are my "primary" 6990's vga controllers.
Also,
my crossfire bridge is disconnected. I'm working with the other
card,
devices 0d and 0e. I've included the USB card as well in the list
because I'm using it, but it causes me no problems whatsoever.
For what
its worth, that USB card works great in ESXi as well... Highpoint
enabled ACS on their PEX chips :D
Just out of interest:
1) Are you using a multi-socket motherboard?
Nope! It's a Gigabyte GA-EX58-EXTREME. It's LGA1366 with an i7
920 in
it. VT-d support is provided through a hacked BIOS image that I
found
on the web a couple years or so ago.
Having to use a hacked BIOS for VT-d support is not a good sign or a
good starting point...
Technically, you're right. AFAIK though, this particular generation of
i7 chips allows for VT-d to be managed entirely by the chipset/bios.
That's just it - I don't like things only manageable by binary blobs
with no source code. I'd much rather just have a clean interface (e.g.
from /sys/) to just write the relevant registers straight to the
hardware to enable/disable features. Otherwise you're at the mercy of
motherboard manufacturers who have no interest in supporting a product
for people who have already bought it (sale's made, why should they care).
There's no particular req (however artificial) coming out of the CPUs
for this generation that stipulates VT-d can't be patched in... so I
figured, "why not?" I was modding my BIOS anyway and decided to use
this one as a base because it had both VT-d and fully updated option
ROMs for all my onboard stuff. The world of BIOS modding is a /very/
neat one; I highly suggest every nerd spend a few days there at some
point in his life ;)
Last time I checked, this was mostly limited to people using BIOS
editors to unhide features. Have things actually progressed to the point
where you can add in a specific assembly payload to initialize things
differently?
To the point though, it seems very well behaved on everything that
/isn't/ my 6990 :-(
Didn't you mention you had another ATI GPU in another rig that you could
borrow temporarily? It might be worth a shot to see if it's the dual
GPUs that are foiling you. Especially since they are inevitable on the
same PCIe bridge. A standalone single GPU might just work.
Ironically, my Quadro has been refusing to play ball completely today
(it worked passably well yesterday, although not as well as my 6450
card, which today seems to be working well enough to get to the login
screen without BSOD-ing. Different slot this time, though, so we'll see
how it fares in a bit.
[noirqbalance, limiting guest to 3.5GB of RAM]
[screen corruption, white/black lines]
Yeah. I'm convinced now. They might be a different color, but they're
in chrome (which uses a GPU accelerated 2d canvas) and they seem to
precede the crash pretty reliably.
Yes, similar here, although I don't use Chrome - I get them in most
things, including on the desktop once it has all started to go wrong.
though I'm considering a hard-hack: think
of a 12v relay and a PCIe extender cable---if a D3D0
reset actually
powers off the slot momentarily but the PSU plugs on
the card
prevent
it from working, then I could rig up a switch that ties
those plugs'
power state into the slot itself---it's radical, yes, but
possibly the
most inventive solution I can think of so far. I'm
super curious to
see if anyone more knowledgeable than myself thinks it
would work,
because it'd be super cheap to build! As the saying goes
though, I'll
"cross that bridge when I come to it." :)
Interesting. In theory, I think this _should_ work provider
your PCIe
bridges support hot-plugging.
To be certain, you'd have to switch both the PCIe slot and
(if your card
uses it) the external power inputs.
That'd be the idea. Assuming it works the way I think it does,
I could
tap a 12v (I'm pretty sure it's 12v in there) relay into the Vcc
and GND
pins of the PCIe slot and use the relay's output to switch the
Vcc from
the plug-in cables off of the PSU. Bears testing with a
slightly less
expensive card, but I wouldn't be surprised to see it work! It'd
require some case modding for sure though, as the extension
cable will
get in the way of properly seating the card. It could be
possible to
build a tap that could be "slipped in" to a card's PCIe slot...
Short
of proper FLR support, this could actually very cheaply be built
into
the expansion card itself. I'd suspect that simply adding FLR
would be
cheaper on the card manufacturers though. :)
Just get a case with more slot cutouts on the back than your
motherboard has slots. Then feed the ribbon to the bottom so the
card sits in the slot on the case that is below your motherboard -
no modding required. :)
But... but! I guess that'd require a mini(?) or MicroATX board. I'm a
full size to XL ATX (or whatever the monster-sized boards are) kind of
guy. Guess I just want more slots to pass GPUs to VMs, eh? :)
You don't need a smaller motherboard - you need a bigger case. :)
With your board, you could probably do this with a PC-P80 Armorsuit (one
of the few off the shelf cases that will take my SR-2 due to a weird,
needlessly oversized form factor - I mean seriously, who needs 7 PCIe
x16 slots??).
Hmm... Something just occurred to me - on the SR-2 this could be
implemented _TRIVIALLY_! The SR-2 has jumpers to disable/enable each of
the PCIe slots. So in theory, all I'd have to do is put together a
simple USB controlled witch that would toggle between connecting pins
1-2 and 2-3, and attach it using a normal 3-pin jumper-type header to
the jumper block in question. Or (boringly), just wire it up to a
suitable button on the front of the case.
I might just have to try this and see what happens (and hope it doesn't
make the magic smoke escape from something).
There's supposed to be some cases out there that allow for mounting of
expansion cards on the end of flexible extenders. Haven't heard about
them in a couple years, but either way chances are pretty good that such
cases aren't exactly affordable... they likely target enterprise
customers or simply have limited runs... economy of scale and all that.
Probably the "slip-in" type of adapter/approach would be best, but I
don't wanna get ahead of myself on a simple idea that may not even work :P
Usually rack-mount cases.
But it's amazing what you can achieve with a dremel and a power drill in
a few minutes. ;)
With that in mind, even though I've taken your advice
and added the
config info to my xend files, its entirely
possible---especially in
light of what Casey said---that I'm just Doing It
Wrong(TM). It'd
likely be beneficial for us both to compare notes on that
regard. If
either of you would be willing to help, I could
probably use some
pointers... I've kinda run out of logs to look at with
my current
knowledge on the subject :P
Certainly - what notes do you propose we compare?
I'm not completely sure. If you can point me to the proper files to
verify that my device has the same PCIe-level compatibility
issues as
yours (verify that ACS isn't available to the device and so on)
then I'd
call that a step in the right direction.
Another thing - Do "lspci -vt" - can you put the card in a slot
where it doesn't share a bridge with any other PCIe devices?
I don't think so. You should see the built-in bridge... it's implied
slightly up the hierarchy from the two side-by-side 6990 devices, which
itself attaches to the root port at the top:
http://pastebin.com/raw.php?i=4dGmneYi
But the 2 GPUs are inevitably on the same bridge. I think trying a
single GPU would definitely be a good next step in troubleshooting.
Wish me luck!
To both of us! :)
Gordan
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users
|