[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] pci-passthrough loses msi-x interrupts ability after domain destroy



On 22/09/17 04:09, Christopher Clark wrote:
> On Thu, Sep 21, 2017 at 1:27 PM, Sander Eikelenboom
> <linux@xxxxxxxxxxxxxx> wrote:
>>
>> On Thu, September 21, 2017, 10:39:52 AM, Roger Pau Monné wrote:
>>
>>> On Wed, Sep 20, 2017 at 03:50:35PM -0400, Jérôme Oufella wrote:
>>>>
>>>> I'm using PCI pass-through to map a PCIe (intel i210) controller into
>>>> a HVM domain. The system uses xen-pciback to hide the appropriate PCI
>>>> device from Dom0.
>>>>
>>>> When creating the HVM domain after an hypervisor cold boot, the HVM
>>>> domain can access and use the PCIe controller without problem.
>>>>
>>>> However, if the HVM domain is destroyed then restarted, it won't be
>>>> able to use the pass-through PCI device anymore. The PCI device is
>>>> seen and can be mapped, however, the interrupts will not be passed to
>>>> the HVM domain anymore (this is visible under a Linux guest as
>>>> /proc/interrupts counters remain 0). The behavior on a Windows10 guest
>>>> is the same.
>>>>
>>>> A few interesting hints I noticed:
>>>>
>>>> - On Dom0, 'lspci -vv' on that PCIe device between the "working" and
>>>> the "muted interrupts" states, I noted a difference between the
>>>> MSI-X caps:
>>>>
>>>> - Capabilities: [70] MSI-X: Enable- Count=5 Masked- <-- IRQs will work if 
>>>> domain started
>>>> + Capabilities: [70] MSI-X: Enable- Count=5 Masked+ <-- IRQs won't work if 
>>>> domain started
>>>>                                             ^^^^^^^
>>
>>> IMHO it seems that either your device is not able to perform a reset
>>> successfully, or Linux is not correctly performing such reset. I don't
>>> think there's a lot that can be done from the Xen side.
>>
>> Unfortunately for a lot of pci-devices a simple reset as performed by 
>> default isn't enough,
>> but also almost none support a real pci FLR.
>>
>> In the distant past Konrad has made a patchset that implemented a bus reset 
>> and
>> reseting config space. (It piggy backed on already existing libxl mechanism 
>> of
>> trying to call on a syfs "do_flr" attribute which triggers pciback to perform
>> the busreset and rewrite of config space for the device.
>>
>> I use that patchset ever since for my pci-passtrough needs and it works 
>> pretty
>> well. I can shutdown an restart VM's with pci devices passed trhough (also 
>> AMD
>> Radeon graphic cards).
> 
> Just to confirm the utility of that piece of work: OpenXT also uses an
> extended version of that same patch to perform device reset for
> passthrough.
> 
> I've attached a copy of that OpenXT patch to this message and it can
> also be obtained from our git repository:
> https://github.com/OpenXT/xenclient-oe/blob/f8d3b282a87231d9ae717b13d506e8e7e28c78c4/recipes-kernel/linux/4.9/patches/thorough-reset-interface-to-pciback-s-sysfs.patch
> This version creates a sysfs node named "reset_device" and the OpenXT
> libxl toolstack is patched to use that node instead of "do_flr".

Nice to hear there are more users of this patch. On #xen on IRC there were from 
time to time
also users who tried pci-passtrough and ran into this issue (and probably 
abandonning the idea
since having to restart your host before being able to use your pass throughed 
device again
defies much of the use case).
 
> Konrad's original work encountered pushback on upstream acceptance at
> the time it was developed. I'm not sure I've found where that
> discussion ended. Is there any prospect of a more comprehensive reset
> mechanism being accepted into xen-pciback, or elsewhere in the kernel?

Yeah it was nacked by David Vrabel and the discussion somewhat bleeded to 
death. 
From what i remember the main issue was with the naming, since it doesn't do a 
FLR,
the sysfs hook shouldn't be called "do_flr".

Some other perhaps minor issues i can think of are:
- No way to excempt pci-devices from this new way of resetting them.
  Perhaps there could be pci devices/topologies were this way of
  resetting causes more problems than it solves and could cause a
  regression. Unfortunately auto detecting what works doesn't seem to
  be possible. On the other hand (though only with my n=10) i haven't 
encountered
  such a device yet.

- The communication path between libxl and the kernel via sysfs.
  I think the preference was for a:
  a) having it use a more common used Xen communication channel or
  b) having it all self-contained in pci-back. (from my memory and the openxt 
patch description
     there could be some locking issue when trying to implement it this way,
     but the vfio guys had that solved for there reset implementation if i
     from one of the comments in there source code (patches by Alex Williamson
     if i remember correctly).

- Not an issue back then when the patch was made, but as the question earlier 
to Roger,
  the hypervisor seems to grow more interference with pci devices with the PVH 
dom0 work.
  If and hoow does that relate to pci-back and pci-passthrough and (the 
location of) resetting mechanisms ?


So i think David's NACK was mostly for the patchset having some hackish 
cosmetics.

On the upside one can conclude that this patchset is now pretty well tested 
over the years ;)

Since David has left, perhaps Jurgen/Boris/Konrad could express their views 
(again) ?
(CC'ed them as well)

> As noted in the original LKML threads, vfio has similar relevant pci
> device reset retry logic. (Thanks to Rich Persaud for this pointer:)
> http://elixir.free-electrons.com/linux/v4.14-rc1/source/drivers/vfio/pci/vfio_pci.c#L1353
> 
> libvirt also performs similar reset logic, using a direct low level
> interface to config space (Thanks to Marek for this pointer, libvirt
> is used by Qubes:)
> https://github.com/libvirt/libvirt/blob/master/src/util/virpci.c#L929
> I thinks this indicates that it would be possible to extend libxl to
> do something similar, but that seems less satisfactory compared to
> performing the work in a kernel-provided implementation.
> 
> Is there a way forward to providing this functionality within Xen
> software or Linux> Christopher
> --
> 
> openxt.org
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.