[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Status of FLR in Xen 4.4

On Fri, 27 Sep 2013 09:48:34 -0400, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
On Fri, Sep 27, 2013 at 02:27:46PM +0100, Gordan Bobic wrote:
On Fri, 27 Sep 2013 14:26:31 +0200, Matthias
<matthias.kannenberg@xxxxxxxxxxxxxx> wrote:
>Hi Gordon,
>I tried your patch on my dom0 kernel and I think it somehow helped in
>the sense that now I can reboot the domUs now without crashing the
>whole host, but linux domU still gets a blackscreen and windows7 domU
>only starts till black screen with (actual movable) cursor, but not
>furthor.. this might only be a coincidence, though, have to double
>check this..

What patch? Nothing I posted to the list is fit for public
consumption yet. You shouldn't be using it unless you really,
REALLY know exactly what it does and know exactly what you
are trying to achieve.

>I tried some other stuff, too:
>1) after domU shutdown rebind both functions to the dom0 drivers,
>do a
>sysfs reset and re-add to assignable devices -> crashes dom0

My experience shows that letting dom0 drivers ever touch the hardware
is a recipe for disaster.

>2) after domU shutdown rebind both functions to the dom0 drivers and
>readd to assignable devices -> dom0 crashes somtime when domU using
>the devices comes up, sometimes not, but no success either way
> 3) sysfs reset of the devices within domU seems to be passed through
>dom0 (see commands in qemu-log) but no effect

It's up to the drivers to do the sensible thing. Nvidia drivers
handle this a little more sanely, but if the drivers cannot handle
clobbering the device's state into a known state, you are pretty
much fighting a losing battle.

>Also, I analysed your code and compared it to the stuff in the python
>tools of xm and it is the same approach and i don't see any obvious

I am starting to suspect you aren't actually talking about my code
but somebody else's...

>Then I tried to replicate the secondary bus reset on
>command lind for testing purposes via
> printf 'x40' | dd of=/sys/devices/pci0000:00/0000:00:0b.0/config
>seek=$((0x3e)) count=1 conv=notrunc
>but I think I got some endians or offset slightly wrong because after
>that xl refuses to give the device (00:0b.0 is the bus of my
>2-function vga card I have assigned to my domU) to the domU and later
>crashes dom0.
>So I'm a little lost at that point and would welcome some
>Does FLR reset works for any of you for vga cards?

If you are talking about VGA cards with _proper_ FLR implementations
on PCI level - there is no such thing. In all cases it is down to
the domU driver to handle the card in whatever state it is. This
works reasonably well with supported Nvidia cards (i.e.
Quadro [K][2456]000 and Grid K[12] and equivalent modified GeForce
cards (Fermi 4xx and Kepler 6xx/7xx series)). I never managed to
get it working properly on any other GPUs.

Even with Nvidia cards rebooting can lead to issues. For example,
I have two GPUs passed to two different domUs. One is a GTX470
modified to Q5000. The other is a GTX480 modified to Q6000. The
domU with Q5000 always handled reboots reasonably reliably. The
one with a Q6000 did not. I since switched the one with a Q6000
to a QK5000 (modified GTX680), and now the reboots seem to work
reasonably reliably, but I have found that there is still a
crash if the monitor on the card changes between shutdown and
restart - I'm guessing the card remembers it's state and if it
isn't consistent when it returns, driver gets confused. I have
other issues (see recent thread about Nvidia passthrough from
David), but they seem to be specific to my setup.

This state thing. If one were to capture the cards state before
doing any PCI passthrough in and tried to write it exactly
back would that eliminate some of these issues?

I know that the pciback does that to the PCI configuration values.
(Or at least it should) whenever a device has been de-assigned
from a guest - or unplugged.

But I presume that the rest (the BAR contents) are not in any
way saved/restored. What would be the worst if one wrote exactly
all of the MMIO values back as they were?

(Probably a recipe for disaster, but who knows).

It's not perfect, but it's the only workable solution I have

That doesn't cover the entire state of the device.
What about the rest of the device memory and states of all
the proprietary registers?

Since there are open source FB and accelerated drivers
available for Radeon cards, enough is publicly known about
them to be able to achieve suitable resetting. How
difficult that might be to achieve, I have no idea. I
have seen the open source Radeon Xorg driver successfully
reset the GPU when the GPU stopped responding without
taking Xorg or any of the running apps down in the process,
so something similar to what it does might just be good

Whether it is a good idea to adopt anything but a fully
hands-off approach to any passthrough hardware is a
different question entirely.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.