[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Device Reset on Nvidia GPUs



On Fri, Nov 15, 2013 at 02:52:56PM +0000, Gordan Bobic wrote:
> On Fri, 15 Nov 2013 09:40:04 -0500, Konrad Rzeszutek Wilk
> <konrad.wilk@xxxxxxxxxx> wrote:
> >On Fri, Nov 15, 2013 at 02:29:39PM +0000, Gordan Bobic wrote:
> >>On Fri, 15 Nov 2013 14:27:21 +0000, Stefano Stabellini
> >><stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>>On Fri, 15 Nov 2013, Gordan Bobic wrote:
> >>>>I've noticed that nouveau driver has a sysfs reset implemented
> >>>>(although I'm not sure whether it is just a stub or whether it
> >>>>does anything).
> >>>>
> >>>>Now, I fully understand that this is not actually necessary,
> >>>>based purely on empirical evidence:
> >>>>
> >>>>My ATI cards reliably crash the host when the domU the are passed
> >>>>to is rebooted, and the xen-pciback driver does have the sysfs
> >>>>reset implemented for ATI cards.
> >>>>
> >>>>OTOH, my (modified) Nvidia cards handle domU reboots perfectly
> >>>>and the xen-pciback driver has no sysfs reset implementation
> >>>>for those.
> >>>>
> >>>>So I'm kind of torn between:
> >>>>1) It's not broken so don't even think about trying to fix it.
> >>>>2) Since FOSS reset implementation seems to exist, it might be
> >>>>handy to port it into the xen-pciback feature list (caveat:
> >>>>this may impact 1), which would be embarrasing).
> >>>>
> >>>>Thoughts?
> >>>
> >>>libxl is capable of using the sysfs reset node, so there
> >>shouldn't be
> >>>any needed for porting the reset code to pciback
> >>
> >>Not quite - when the device is owned by xen-pciback, there is
> >>no reset node. When it is owned by nouveau, the reset node in
> >>sysfs is there.
> >
> >Sure, but pciback does the reset:
> >
> >
> >      /* We need the device active to save the state. */
> >
> >        dev_dbg(&dev->dev, "save state of device\n");
> >
> >        pci_save_state(dev);
> >
> >        dev_data->pci_saved_state = pci_store_saved_state(dev);
> >
> >        if (!dev_data->pci_saved_state)
> >
> >                dev_err(&dev->dev, "Could not store PCI conf saved
> >state!\n");
> >        else {
> >
> >                dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the
> >device\n");
> >                __pci_reset_function_locked(dev);
> >
> >                pci_restore_state(dev);
> >
> >        }
> >
> >The pci_reset_function(..) - the non-locked variant) is called
> >when you
> >do 'reset' to the SysFS.
> >
> >Unless the nouveau driver does some extra 'reset'?
> 
> I don't know for sure at the moment. I was just basing this on the
> observation that xl complains that there is no sysfs reset for the
> device when instantiating the VM and there being no reset node for
> the device in sysfs.
> 
> It seems oddly inconsistent that there is a reset node in sysfs for
> ATI cards (that crash the host on domU reboot) but no reset node for
> Nvidia cards (which work fine on domU reboot).
> 
> There is no FLR or D3 PM on my Nvidia cards, so that could
> be why (there is D3hot on ATI). But nouveau driver still exposes
> a reset node for the device.

You said that the nvidia cards have no reset (above) but then they do
have a reset? Or is that the nvidia driver has no reset, but the
nouvau has?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.