[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure


  • To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 22 Jun 2023 10:15:20 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lBWQaluGv8p5xtOMjIrR3uIbrFpSciKpPmLvQgUj4E4=; b=dOMu+Gj8XllhwI2OXA5730n1Zu/irF5r2zHUhf9YUQCPX9Vj0FIo5vS419lgXUr5scKojugVg+R/+cwiG2Le5+DT/sZz2xARhAR9j+1/G/4pu8FQBi3WG6qxrtVV6MP9dXrBCpkDUej92A+aUTfq1LkvyXy9ResLzwsl5PZXy0kL5iKkxKu7SrSfNGjmx5qiE9kISVCy14G385ocSgDr8GFoPF1d6RsT0qW8nPhdC1nEfnvHkv1lC6W0sbO9iHwrGL9z3Ruo2CO8xt7vZaLTviWscJ4xxmYVmJSIqCQryfZD5cMkEgxJwWRIDy+XECiHUuYXb8GirZYqeXWQmT8nTw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MtO5y6ji42YuQbtp7jYe1CsbUCfg5k4MKjCmLaRiebLjkpdBTvDfhRyivKvwdHQfkzPXZuonE/CabiDEhStKGN6WpjrBX9NwNF8Aa6gsplkLMSkFDu1zTZMMXyw4vGPUY/rCc9q1ktOx1Oe/9TTm19FsbbqXTjrgN48CuJTiWEw/lHpkom3PzRzjRXI969DYHQX7lnPfDPifXpIEDIhp9shalhtehDyqorQ4sm+VpkFci+0vIQdQn6GHnZeNBpCD+Tqha+4nioqgetfYOr2Te2/K2JiSgPgb/QsMSW68YnAuLMYjXp6rbTvUXLjqSWhuisugspUO/QHqYFh4UCgwRw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Thu, 22 Jun 2023 08:15:40 +0000
  • Ironport-data: A9a23:yNSf3Kl0sqstFE5D4jv5ivro5gwTJ0RdPkR7XQ2eYbSJt1+Wr1Gzt xJJX2iBOffZajajKd1za9my80NQsZTRztRjQVBl/yo0RSMWpZLJC+rCIxarNUt+DCFhoGFPt JxCN4aafKjYaleG+39B55C49SEUOZmgH+a6U6icfHgqH2eIcQ954Tp7gek1n4V0ttawBgKJq LvartbWfVSowFaYCEpNg064gE0p5KyaVA8w5ARkPqgV5gWGzRH5MbpETU2PByqgKmVrNrbSq 9brlNmR4m7f9hExPdKp+p6TnpoiG+O60aCm0xK6aoD66vRwjnVaPpUTbZLwXXx/mTSR9+2d/ f0W3XCGpaXFCYWX8AgVe0Ew/yiTpsSq8pefSZS0mZT7I0Er7xIAahihZa07FdRwxwp5PY1B3 cIKcykQbBCIvby32rSfEOdmmf0pFvC+aevzulk4pd3YJdAPZMmbBo/suppf1jp2gd1SF/HDY cZfcSBocBnLfxxIPBEQFY46m+CrwHL4dlW0qnrM/fZxvzeVk1Q3ieC1WDbWUoXiqcF9hEGXq 3iA523kKhobKMae2XyO9XfEaurnxHqgB9lDTufjnhJsqFjDxVAwNicbb2W+/OuF21eOcOl7d HVBr0LCqoB3riRHVOLVYRq8p3KVuw8GbPBZGeY69QKlx7Ld5kCSAW1sZi5MbpkqudE7QRQu1 0SVhJX5CDp3qrqXRHmBsLCOoluaHiwYLnQLYyMeeiID78P+u4E4jh/JTdFLHba8i5v+HjSY6 zKAoTU6hr4TpdUWzKj99lfC6xqzorDZQwhz4R/YNkqn8wd4aYiNd4Gur1/B4p5oN52FR1OMu HwFncm27+0UC5yJ0iuXT40lBLi0496VPTuahkRgd6TN7Byo8n+nOIpWvzd3IR4xNt5eIGexJ kjOpQlW+ZlfemOwarN6aJ6wDMJsyrX8EdPiVbbfad8mjoVNSTJrNRpGPSa4t10BWmB1+U3jE f93qfqRMEs=
  • Ironport-hdrordr: A9a23:NFtFqKstzagUtew/oIY2UGT97skDSdV00zEX/kB9WHVpmwKj5r mTdZUgpGfJYVMqMk3I9urwXZVoLUmsl6KdpLNhXotKPzOGhILLFvAH0WKK+VSJcBEWtNQ86U 4KSdkYNDSfNykdsS842mWF+hQbreVvPJrGuQ4W9RlQcT0=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 21, 2023 at 10:07:20PM +0000, Volodymyr Babchuk wrote:
> 
> Hi Roger,
> 
> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
> 
> > On Tue, Jun 13, 2023 at 10:32:26AM +0000, Volodymyr Babchuk wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >> 
> >> Introduce a per-domain read/write lock to check whether vpci is present,
> >> so we are sure there are no accesses to the contents of the vpci struct
> >> if not. This lock can be used (and in a few cases is used right away)
> >> so that vpci removal can be performed while holding the lock in write
> >> mode. Previously such removal could race with vpci_read for example.
> >> 
> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> >> from being removed.
> >> 
> >> 2. Writing the command register and ROM BAR register may trigger
> >> modify_bars to run, which in turn may access multiple pdevs while
> >> checking for the existing BAR's overlap. The overlapping check, if done
> >> under the read lock, requires vpci->lock to be acquired on both devices
> >> being compared, which may produce a deadlock. It is not possible to
> >> upgrade read lock to write lock in such a case. So, in order to prevent
> >> the deadlock, check which registers are going to be written and acquire
> >> the lock in the appropriate mode from the beginning.
> >> 
> >> All other code, which doesn't lead to pdev->vpci destruction and does not
> >> access multiple pdevs at the same time, can still use a combination of the
> >> read lock and pdev->vpci->lock.
> >> 
> >> 3. Optimize if ROM BAR write lock required detection by caching offset
> >> of the ROM BAR register in vpci->header->rom_reg which depends on
> >> header's type.
> >> 
> >> 4. Reduce locked region in vpci_remove_device as it is now possible
> >> to set pdev->vpci to NULL early right after the write lock is acquired.
> >> 
> >> 5. Reduce locked region in vpci_add_handlers as it is possible to
> >> initialize many more fields of the struct vpci before assigning it to
> >> pdev->vpci.
> >> 
> >> 6. vpci_{add|remove}_register are required to be called with the write lock
> >> held, but it is not feasible to add an assert there as it requires
> >> struct domain to be passed for that. So, add a comment about this 
> >> requirement
> >> to these and other functions with the equivalent constraints.
> >> 
> >> 7. Drop const qualifier where the new rwlock is used and this is 
> >> appropriate.
> >> 
> >> 8. Do not call process_pending_softirqs with any locks held. For that 
> >> unlock
> >> prior the call and re-acquire the locks after. After re-acquiring the
> >> lock there is no need to check if pdev->vpci exists:
> >>  - in apply_map because of the context it is called (no race condition
> >>    possible)
> >>  - for MSI/MSI-X debug code because it is called at the end of
> >>    pdev->vpci access and no further access to pdev->vpci is made
> >> 
> >> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
> >> and if so, allow reading or writing the hardware register directly. This is
> >> acceptable as we only deal with Dom0 as of now. Once DomU support is
> >> added the write will need to be ignored and read return all 0's for the
> >> guests, while Dom0 can still access the registers directly.
> >> 
> >> 10. Introduce pcidevs_trylock, so there is a possibility to try locking
> >> the pcidev's lock.
> >> 
> >> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
> >> while accessing pdevs in vpci code.
> >> 
> >> 12. This is based on the discussion at [1].
> >> 
> >> [1] 
> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!zPy31CWFWlyC0xhEHiSj6rOPe7RDSjLranI9KZqhG4ssmChJMWvsPLJPQGTcVsnnowZpP8-LaKJkIWIzb8ue0DoYhg$
> >>  [lore[.]kernel[.]org]
> >> 
> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >
> > Thanks.
> >
> > I haven't looked in full detail, but I'm afraid there's an ABBA
> > deadlock with the per-domain vpci lock and the pcidevs lock in
> > modify_bars() vs  vpci_add_handlers() and vpci_remove_device().
> >
> > I've made some comments below.
> 
> Thank you for the review. I believe that it is a good idea to have a
> per-domain pdev_list lock. See my answers below.

I think it's important that the lock that protects domain->pdev_list
must be the same that also protects pdev->vpci, or else you might run
into similar ABBA deadlock situations.

The problem then could be that in vpci_{read,write} you will take the
per-domain pdev lock in read mode in order to get the pdev, and for
writes to the command register or the ROM BAR you would have to
upgrade such lock to a write lock without dropping it, and we don't
have such functionality for rw locks ATM.

Maybe just re-starting the function knowing that the lock must be
taken in write mode would be a good solution: writes to the command
register will already be slow since they are likely to involve
modifications to the p2m.

> >> @@ -213,6 +227,7 @@ static void defer_map(struct domain *d, struct pci_dev 
> >> *pdev,
> >>      raise_softirq(SCHEDULE_SOFTIRQ);
> >>  }
> >>  
> >> +/* This must hold domain's vpci_rwlock in write mode. */
> >
> > Why it must be in write mode?
> >
> > Is this to avoid ABBA deadlocks as not taking the per-domain lock in
> > write mode would then force us to take each pdev->vpci->lock in order
> > to prevent concurrent modifications of remote devices?
> 
> Yes, exactly this. This is mentioned in the commit message:
> 
>     2. Writing the command register and ROM BAR register may trigger
>     modify_bars to run, which in turn may access multiple pdevs while
>     checking for the existing BAR's overlap. The overlapping check, if done
>     under the read lock, requires vpci->lock to be acquired on both devices
>     being compared, which may produce a deadlock. It is not possible to
>     upgrade read lock to write lock in such a case. So, in order to prevent
>     the deadlock, check which registers are going to be written and acquire
>     the lock in the appropriate mode from the beginning.

Might be good to put part of the commit message in the code comment,
just saying that the lock must be taken in write mode is not very
informative.

/*
 * The <lock-name> per domain lock must be taken in write mode in
 * order to prevent changes to the vPCI data of devices assigned to
 * the domain, since the function parses such data.
 */

> 
> >
> >>  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool 
> >> rom_only)
> >>  {
> >>      struct vpci_header *header = &pdev->vpci->header;
> >> @@ -287,6 +302,7 @@ static int modify_bars(const struct pci_dev *pdev, 
> >> uint16_t cmd, bool rom_only)
> >>       * Check for overlaps with other BARs. Note that only BARs that are
> >>       * currently mapped (enabled) are checked for overlaps.
> >>       */
> >> +    pcidevs_lock();
> >
> > This can be problematic I'm afraid, as it's a guest controlled path
> > that allows applying unrestricted contention on the pcidevs lock.  I'm
> > unsure whether multiple guests could be able to trigger the watchdog
> > if given enough devices/vcpus to be able to perform enough
> > simultaneous calls to modify_bars().
> >
> > Maybe you could expand the per-domain vpci lock to also protect
> > modifications of domain->pdev_list?
> >
> > IOW: kind of a per-domain pdev_lock.
> 
> This might work, but I am not sure if we need to expand vpci lock. Maybe
> it is better to add another lock that protects domain->pdev_list? I
> afraid that combined lock will be confusing, as it protects two
> semi-related entities.

I think expanding is the best solution, otherwise you are likely to
run into the same ABBA deadlock situations.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.