[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V1 1/1] Xen/libxl: Perform PCI reset using 'reset' SysFS attribute

On Wed, Dec 13, 2017 at 9:05 PM, Govinda Tatti <Govinda.Tatti@xxxxxxxxxx> wrote:
> Thanks George for your review comments. Please see below for my comments.
> On 12/13/2017 8:01 AM, George Dunlap wrote:
>> On Thu, Dec 7, 2017 at 10:26 PM, Govinda Tatti <Govinda.Tatti@xxxxxxxxxx>
>> wrote:
>>> The life-cycle of a PCI device in Xen pciback is complex and is
>>> constrained
>>> by the generic PCI locking mechanism.
>>> - It starts with the device being bound to us, for which we do a function
>>>    reset (done via SysFS so the PCI lock is held).
>>> - If the device is unbound from us, we also do a function reset
>>>    (done via SysFS so the PCI lock is held).
>>> - If the device is un-assigned from a guest - we do a function reset
>>>    (no PCI lock is held).
>>> All reset operations are done on the individual PCI function level
>>> (so bus:device:function).
>>> The reset for an individual PCI function means device must support FLR
>>> (PCIe or AF), PM reset on D3hot->D0 device specific reset, or a secondary
>>> bus reset for a singleton device on a bus but FLR does not have
>>> widespread
>>> support or it is not reliable in some cases. So, we need to provide an
>>> alternate mechanism to users to perform a slot or bus level reset.
>>> Currently, a slot or bus reset is not exposed in SysFS as there is no
>>> good
>>> way of exposing a bus topology there. This is due to the complexity -
>>> we MUST know that the different functions of a PCIe device are not in use
>>> by other drivers, or if they are in use (say one of them is assigned to a
>>> guest and the other is  idle) - it is still OK to reset the slot
>>> (assuming
>>> both of them are owned by Xen pciback).
>>> This patch does that providing an option to perform a flr/slot/bus reset
>>> when a PCI device is owned by Xen PCI backend. It will try to execute one
>>> of these reset method, starting with FLR if it is supported. Otherwise,
>>> it tries slot or bus reset method. For slot or bus reset method, it also
>>> checks to make sure that all of the devices under the bridge are owned by
>>> Xen PCI backend before applying those resets.
>>> Due to the complexity with the PCI lock we cannot do the reset when a
>>> device is bound ('echo $BDF > bind') or when unbound ('echo $BDF >
>>> unbind')
>>> as the pci_[slot|bus]_reset also takes the same lock resulting in a
>>> dead-lock.
>>> Putting the reset function in a work-queue or thread won't work either -
>>> as we have to do the reset function outside the 'unbind' context (it
>>> holds
>>> the PCI lock). But once you 'unbind' a device the device is no longer
>>> under
>>> the ownership of Xen pciback and the pci_set_drvdata has been reset, so
>>> we cannot use a thread for this.
>>> Instead of doing all this complex dance, we depend on the tool-stack
>>> doing
>>> the right thing. As such, we implement 'reset' SysFS attribute which 'xl'
>>> uses when a device is detached or attached from/to a guest. It bypasses
>>> the need to worry about the PCI lock. BTW, previously defined "do_flr"
>>> attribute has been renamed to "reset" since "do_flr" name doesn't
>>> represent
>>> all PCI reset methods and plus, currently it is not being used. So, there
>>> is no impact in renaming this sysfs attribute.
>>> To not inadvertently do a bus reset that would affect devices that are in
>>> use by other drivers (other than Xen pciback) prior to the reset, we
>>> check
>>> that all of the devices under the bridge are owned by Xen pciback. If
>>> they
>>> are not, we refrain from executing the bus (or slot) reset.
>> There's an awful lot of stuff here, but only a single line of code
>> change, which makes it difficult to tell what's going on.
>> It sounds like you're making changes to Linux to solve the problems
>> you describe, and modifying xl so that it works with this new
>> interface?
> The "reset" SysFS attribute provides an option for xl to perform a PCI reset
> (FLR/SLOT/BUS, one of them and based on its support)when a PCI device is
> being
> added/removed to/from Xen guest.

But you didn't answer my question.  The changeset here says,
"Implement the reset SysFS attribute".  But this changset doesn't do
any such thing.

Is it the case that most of the "fixes" you describe with locking are
happening inside of Linux, and that all this changset does is modify
libxl to use 'reset' instead of 'do_flr'?

>> In which case, xl needs to be backwards-compatible with kernels that
>> don't have your new feature: it will have to check for %s/reset, and
>> if it's not there, then try %/do_flr.
> I think this fix was planned more than a year back and even we pushed libxl
> fix
> ("do_flr" SysFSattribute) but linux kernel fix was not integrated for some
> reason.
> Now, we are revisitingboth linux kernel and libxl changes. In other-words,
> "do_flr" change is not being usedtoday since we don't have required code
> changes
> in the linux kernel.

Are you saying do_flr doesn't exist at all in any version of Linux,
and as such the line you're removing is currently pointless?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.