[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/6] xen: pci: introduce reference counting for pdev


  • To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 24 Apr 2023 16:27:41 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9AR/32oQ0LXq2/CHm3IaI5u1LOce7N+Pf+2xDoBpsYo=; b=HbYcCRcUbeIlw1GFsXhQe8qkL16yMu3NrV7W0NE8atZ2Yc8AiOgyYu2Ru2M9XYlYpVoMO6JAg3+0sNlgGB0XsCP1bvB2QjkKnzR5wMbcY0GIT8sWmBuo/mny+FT1vyIExoYn+FvsWekbyqESnrdIzcKfS99SNOGhNKbQdteypgZO4F/bb1eUgAdNESajpln9Zqlk0XAIaK/yEcOGi5C3ejJe+hJGU9h3EqpHUUIvB1VenqTkZgmCPvbGBVifRRUpBZdktp4DCBD8Oo+qONMwC2gHP/0xqe8t02nw+Ry9youBhYzcBpF2LG6rJ2fbHY4Uf+Yyfqh9uEPgCdL0bef1qA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mJcL+MazkewhHYYcVcAEuKV4GRMbnTim7OdmJQ/bG1O5SKi5RjLHAfWoJK5ZyZOrtEOPBzy3M1X/uD+G+y0OoyEGnNWePM1/Rwcn+9zfl9JQpaHeXKxREWFWm47xtoUK1oSl+Ngp1N/rZaY0xiYC4VYnlY9KlWQ0YEnepEwLnYTvWvWcgUzX9+g2llVTZEm7stOrcFucN+n45X8wr+bUjkpR4t1AHbd/CgOZRb/5WGeivrVCAifwHFVw0y83RQrjTkYoK4+KbKF8neyjLDgc6qhc3gL/lchcUU3M80Kh0uBIRAR4+hJKkLCC8Q8AUBUXuHAN0gOcB7wLVsmjYM1I0Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 24 Apr 2023 14:28:15 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24.04.2023 16:15, Volodymyr Babchuk wrote:
> 
> Hi Jan,
> 
> Jan Beulich <jbeulich@xxxxxxxx> writes:
> 
>> On 21.04.2023 16:13, Volodymyr Babchuk wrote:
>>>
>>> Hi Roger,
>>>
>>> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
>>>
>>>> On Fri, Apr 21, 2023 at 11:00:23AM +0000, Volodymyr Babchuk wrote:
>>>>>
>>>>> Hello Roger,
>>>>>
>>>>> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
>>>>>
>>>>>> On Mon, Apr 17, 2023 at 12:34:31PM +0200, Jan Beulich wrote:
>>>>>>> On 17.04.2023 12:17, Roger Pau Monné wrote:
>>>>>>>> On Fri, Apr 14, 2023 at 01:30:39AM +0000, Volodymyr Babchuk wrote:
>>>>>>>>> Above I have proposed another view on this. I hope, it will work for
>>>>>>>>> you. Just to reiterate, idea is to allow "harmless" refcounts to be 
>>>>>>>>> left
>>>>>>>>> after returning from pci_remove_device(). By "harmless" I mean that
>>>>>>>>> owners of those refcounts will not try to access the physical PCI
>>>>>>>>> device if pci_remove_device() is already finished.
>>>>>>>>
>>>>>>>> I'm not strictly a maintainer of this piece code, albeit I have an
>>>>>>>> opinion.  I will like to also hear Jans opinion, since he is the
>>>>>>>> maintainer.
>>>>>>>
>>>>>>> I'm afraid I can't really appreciate the term "harmless refcounts". 
>>>>>>> Whoever
>>>>>>> holds a ref is entitled to access the device. As stated before, I see 
>>>>>>> only
>>>>>>> two ways of getting things consistent: Either pci_remove_device() is
>>>>>>> invoked upon dropping of the last ref,
>>>>>>
>>>>>> With this approach, what would be the implementation of
>>>>>> PHYSDEVOP_manage_pci_remove?  Would it just check whether the pdev
>>>>>> exist and either return 0 or -EBUSY?
>>>>>>
>>>>>
>>>>> Okay, I am preparing patches with the behavior you proposed. To test it,
>>>>> I artificially set refcount to 2 and indeed PHYSDEVOP_manage_pci_remove
>>>>> returned -EBUSY, which propagated to the linux driver. Problem is that
>>>>> Linux driver can't do anything with this. It just displayed an error
>>>>> message and removed device anyways. This is because Linux sends
>>>>> PHYSDEVOP_manage_pci_remove in device_remove() call path and there is no
>>>>> way to prevent the device removal. So, admin is not capable to try this
>>>>> again.
>>>>
>>>> Ideally Linux won't remove the device, and then the admin would get to
>>>> retry.  Maybe the way the Linux hook is placed is not the best one?
>>>> The hypervisor should be authoritative on whether a device can be
>>>> removed or not, and hence PHYSDEVOP_manage_pci_remove returning an
>>>> error (EBUSY or otherwise) shouldn't allow the device unplug in Linux
>>>> to continue.
>>>
>>> Yes, it would be ideally, but Linux driver/device model is written in a
>>> such way, that PCI subsystem tracks all the PCI device usage, so it can
>>> be certain that it can remove the device. Thus, functions in the device
>>> removal path either return void or 0. Of course, kernel does not know that
>>> hypervisor has additional uses for the device, so there is no mechanisms
>>> to prevent removal.
>>
>> Could pciback obtain a reference on behalf of the hypervisor, dropping it
>> when device removal is requested (i.e. much closer to the start of that
>> operation), and only if it finds that no guests use the device anymore?
> 
> Yes, it can, it this indeed will hold a reference to a pci device for a
> time, but there are some consideration that made this approach not
> feasible.
> 
> Basically, when an user writes to /sys/bus/pci/SBDF/remove, the
> following happens:
> 
> 1. /sys/bus/pci/SBFD/remove entry is removed - we can't retry the
> operation anymore

Looking at the comment ahead of pci_stop_and_remove_bus_device(),
isn't this too late already. The text there says "has been removed",
not e.g. "is about to be removed". (Of course chances are that it is
the comment which is wrong; I know too little about Linux'es hot-
unplug machinery.)

Jan

> [unimportant things]
> 
> N. pci_stop_dev() function is called. This function unloads a device
> driver. Any good behaving driver should drop all additional references
> to a device at this point.
> 
> [more unimportant things]
> 
> M. PCI subsystem drops own reference to a generic device object
> 
> So, as you can see, admin can't restart a "failed" attempt to remove a
> PCI device.
> 
> On other hand, remove() function can sleep. This allows us to pause
> removal process a bit and check if hypervisor had finished removing a
> PCI device on its side. But, as you pointed out, this can take weeks...
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.