[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Inconsistent use of libxl__device_pci_reset() when adding/removing all functions of a device



Hi Konrad,

I finally got some time to dig into why I had issues with the last patch. Added a lot of
printk:s to find out what happened. My conclusions, possibly incorrect, are as follows:

Each device bound to the pciback module will be reset twice during boot. The done-counter
in the completion struct will be set to 2 since no calls have been made to wait_for_completion.

The counter in the completion struct will be decremented to 1 at the first pcistub_get_pci_dev()
or pcistub_get_pci_dev_by_slot() call.

Something bad happens when pcistub_put_pci_dev() during shutdown of domU. It appears to me
as if the function is expected to block until reset is completed. Code from the patch file:
+ schedule_work(&found_psdev->reset_work); + + /* Wait .. wait */ + wait_for_completion(&found_psdev->reset_done); +

The function will not wait since the done-counter is set to 1. The pcistub_put_pci_dev() will return
before the work item is excecuted. I do not belive that this was intended. The completion structs done
member is decremented to zero but once again set to 1 when the reset work item has been executed.

It is possible to boot the DomU a second time (done-counter in completion struct goes down to zero).
The problem is that pcistub_put_pci_dev() will leave the done-counter at zero when the domU is shut
down for the second time. Trying to boot the domU a third time will stall.

The get-functions expects to be able to decrement the completion counter without initiation work that
increments the counter and the put-function initiates the same number of reset works as the function
calls wait_for_completion(). It doesn't seem right. Could also explain why unbound devices hangs. Their
completion struct is not likely to be initiated with a count of 2 after startup.

Best Regards,
/Martin




2014-02-04 20:55 GMT+01:00 Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>:
On Tue, Feb 04, 2014 at 07:19:16PM +0100, Martin Öhrling wrote:
> I tried to use vga passthrough with an AMD card and ran into the issues
> with dom0 crash on domU reboot. My intention is to debug the issue and I
> have started to read up on the code for pci passthrough. The following
> observations may not be an error but perhaps someone could shed some light
> over the design intentions.
>
> My first impression was that libxl__device_pci_reset() is a function level
> reset function. It is called each time a single function of a device is
> added or removed. A device reset would break other domU:s if other
> functions of the device are passed through to these domU:s.
>
> The strange thing is that there is only a single libxl__device_pci_reset()
> call when pcidev->vfunc_mask is set to LIBXL_PCI_FUNC_ALL. I'm getting the
> impression that the function is assumed to be a device reset function.
>
> Is the attempt to reset a function, when all other functions belongs to
> pciback, detected and replaced by a device reset?

Yes with these patches:
https://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=devel/xen-pciback.slot_and_bus.v0

But the last one seems to hang pciback when the device is unbound.

>
> Best Regards,
> Martin

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.