[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] VGA/PCI Passthrough of Secondary Graphics Adapter



On 04/29/2013 08:31 PM, Casey DeLorme wrote:

        The lack of FLR, to my understanding, prevents the virtual
        machine from
        resetting the card state, leaving it initialized when switching or
        rebooting your DomU.  As a result the card is activated multiple
        times,
        creating various issues with a range of problems.


    You see, I'm not actually convinced FLR is such a vitally important
    feature. All you are really doing with FLR is pushing the
    re-initialization responsibility onto the BIOS of the PCI device.
    Without it, the driver has to do it. What is it that the driver
    cannot do to reset the device? Think low-level - all you are really
    doing is putting values into the registers of the GPU.


The problems I encountered during my tests would point to the fact that
the card state is not being reset, and FLR happens to exist to do just
that for virtualization.  It seems like a fairly logical assessment to
say that because FLR is not available for consume cards that this is a
problem.  It is very possible that it could be something else, but I am
just pointing at the most likely cause using what I know personally.

I'm not disputing that FLR makes it easier. What I am saying, however, is that if the driver is written without cutting corners and making assumptions, it should be able to manually re-initialize the device.

        During my tests, I broke down the steps logically to resolve the
        issues
        I had installing and updating drivers.  In general, anytime I
        update or
        install AMD drivers I would reboot the entire physical machine
        to ensure
        the graphics card state was proper.


    Interestingly, the one thing that various people have been saying
    about ATI cards is that they get slower and slower on every guest
    reboot, unless you reboot the host. I haven't seen any evidence of
    this on my 6450. The fps I get in the likes of OCCT and Furmark are
    the same after the guest has been rebooted a few times. The problem
    I have is that an application entering full-screen mode causes a crash.


When I reboot my system the card still works just fine for Windows, but
will perform very poorly when I run a 3D application.  To resolve this I
eject the device which causes it to reset and restores the performance,
for the duration it runs.  Rebooting again will require the same steps.
  However, I have not experienced a gradual decrease in performance from
continual reboots, possibly because I eject and reset it after rebooting.
The tests I mentioned were a large number of fresh Windows installations
and AMD driver installations during the 11.x-12.x versions.

Out of the first 16 installations 13 failed to install AMD drivers when
faced with a rebooted Windows and a card with a state that had not been
reset.  The 3 that succeeded BSoD'd on reboot or randomly shortly after
login.

Another 30 installations with a fresh system, only rebooted during the
installation process claimed to work successfully but roughly 15% of the
time would experience video tearing and BSoD without warning (on first
boot), and almost always on reboot.

The final 22 installations I rebooted the whole machine during both
stages of installation, and ran Windows 7 with that for four months
without a problem, the migrated to Windows 8 which has been running for
three months.

With the reliability record you describe above, that makes VGA passthrough stability as close to useless as it can possibly get.

I'm going to try to snipe a suitable Quadro on eBay for a sane sum in the next week or two (I'm not sure what I mean by that, when we are actually talking about what is essentially a GTX260 for the same money as a GTX660, but you have to start somewhere, I suppose). It will be interesting to see if that "just works". If it does, and the Quadro also shows up as lacking FLreset, then that puts the onus purely on driver quality.

          From my experiences this required a freshly installed Windows
        system,
        if you are working on a system you already attempted driver
        installation
        on and it failed (BSoD's or otherwise) uninstalling the failed
        drivers
        and trying again loops back to the same failures for mostly unknown
        reasons.  Presumable it has to do with the use of the .NET
        framework for
        installation, and leaving bits of data behind on failure, but I
        never
        confirmed the exact details.


    I don't know if that helped in any way, but I uninstalled anything
    remotely .NET-ish, including the client profile.


I spent three days trying to figure out the specifics, and did the same
as you, and that did not solve the problems.  In using .NET it either
left something else on the system or I missed something, but I was more
concerned with getting the system working than figuring out why it was
breaking, and three days was enough time to reinstall several times.

I went the other way - installed all the updates, including .NET ones, then removed them. But after upgrading the Xen stack to 4.2.2 from 4.2.1 (keeping 4.2.1-6 hypervisor, later ones just error out when starting the VM with "unknown parameter"), it all seemed to magically start working (except for full-screen 3D not working at all).

        For me, once the drivers are properly installed, rebooting the
        system
        still leaves the card in a pre-initialized state, which to avoid
        performance degradation I would manually reset the card using eject
        media from the lower right icon.  This is not a perfect
        solution, and I
        do not rely on it when performing driver installation or updates.


    Where is this "eject" for a PCI device? What distro are you actually
    running?


The ejection process has nothing to do with Linux or Dom0.  I call it a
"manual ejection" as it does not occur automatically.  The ejection
takes place from inside the running Windows virtual machine, after it
has restarted.  There is an icon to eject media in the bottom menu, when
clicked it displays any passed devices.  My understanding is this
triggers a hot-plug ejection on the device, if you know how to reproduce
that in Linux let me know.

Uhh... You are saying that the ATI card shows up as an ejectable device in the Windows VM? Really??

/me goes to check

Typical - after the VM booted perfectly every time for the last 2 days, now I get nothing but a BSOD (failed attempt to reset the display driver) every time - and I'm doing this on a freshly rebooted machine.

I point my fingers at FLR because its described responsibility is to
reset device state in virtual environments, and every one of the
problems I have encountered appear to be linked to the state of the
card.  However, other users have posted very different experiences,
leading me to believe that it could well be hardware specific.

I suspect it's down to shoddy drivers. Has anyone actually reported problems with a supported Quadro card ([246]000 or FX [345]800)?

I started with a consumer nVidia card, it didn't work that's why I
switched to AMD.  If you can share the model of a $180 USD consumer
nVidia card with HDMI out and onboard audio that can be converted to a
working Quadro model card and is equal to or outperforms my AMD I would
gladly switch.

GTX680 and GTX690 have been done:

http://www.eevblog.com/forum/projects/hacking-nvidia-cards-into-their-professional-counterparts/msg207550/#msg207550

Note: Small amount of soldering required.

But they don't fit in the $180 budget envelope. It does look very much like most Nvidia cards are modifiable to equivalent quadros (those that have reasonably equivalent Quadros at least).

I'm just getting a genuine low-end Quadro (for hopefully <= Â130) first to make sure it'll work, before I get something higher end for modifying.

I'm vaguely tempted by a Titan, but the problem is that the only equivalent enterprise grade card is a Tesla, which means no video out, which would somewhat defeat the purpose. Then again, there seems to be evidence that the number of shaders doesn't have to match with what the Quadro model is expected to have - it looks like the hardware capabilities get auto-deteced correctly.

From what I have seen the Quadro 2000 is the earliest
model with onboard audio, and none of them have HDMI out so you would
need a DisplayPort to HDMI adapter, plus they cost double what I paid
for my card.  If you compare passmark benchmarks my card has a 60%
performance gain over a Quadro K2000, and 30% over the Quadro 4000 which
costs four times as much.  Going back to the lack of demonstration
videos really makes me wary of throwing that much money into a
"possible" alternative, especially one that performs worse than my
current.  While I have seen emails mentioning Quadro, that's about it.

Expect a report back from me on this as soon as I get my hands on a Quadro (2000 or FX3800, I have no intention of spending more than a bare minimum). My priority is to have something that works, and works _reliably_.

  No instructions or demonstration videos or performance comparisons.
  Makes me a bit wary about dropping four times the cost of my current
card for less performance when the only supporting documents are various
emails and a wiki page.

Let me flip that one around - there are plenty of blog entries and videos of ATI cards working, and yet there are several recent posts on this list about using ATI cards and VGA passthrough working at best unreliably, and more often not working at all. Don't fall into the "It must be true, I read it on the internet!" trap. :)

Gordan

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.