[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure


  • To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 23 Jun 2023 10:50:55 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/Cwq3kztympJRScJkWlZG8+8n5xNCz3RbO2GZ/NShqE=; b=loOYqVCjVfEAcKsjyhVT8N7CDJlJMZyEeVmsv4dq8tNtKRHoe8b6sw1ZfGqap0n4fTGbRtHjrF7dR4Sh1NhjSBQogM8nC/KRaaSsKfTwh4ShvMsTKcVSISAyX2Ae/vqQY8xpEvpVCICUvWeHBv53U+qOFZOmZ7nSirZjPdiDp+twz8IxC9qcl2uKwvc4UTWr5nH+RFzaWyWU2wk27Gs9rBUAtuEfKtHuFlO8m5p+mvLmgdaObVm6IsLU0oZ0CmLjVt1MoW9LgpagqulvOLCgUlXaZ+K2iHRbL7MYrRzAYBEC806IUXLZ+dD2LjzUAbP8cQyhVYwENyv28mw634ytLA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OCC/kyV446eqqftsC1klNsftyIyyHUYg+AnwsP3Us9tW9nLHmjfTwgdv2IxeS8TAagArO0WAuu+CjygzLXi7QjNnbh40g4OoMEzQaOeDDkSTgg4ZqkmUzYpsdvsqv3xh/LdBoz6JJMLbprP9cdOXgjHbDxhZWAok2CxiuEWTxlUeXSFCcGjsXvNdE+TAJ795glNwHgwKH5eE5doQij9AMGDLV3eywUN9WPaAolssD9mLXM8J+Sewu0sMsDOZpDMHB/hQnSutiniJAdPhhCxZVWWQhqL4uVUwVtNtiktt9SkE0M1GsZ7/vqp6Ly+xM+bfErWoNzW+gPWYnAC+AQfOqA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Fri, 23 Jun 2023 08:51:23 +0000
  • Ironport-data: A9a23:jWAh/a4JY2Jg4eJVXumv/QxRtKrGchMFZxGqfqrLsTDasY5as4F+v mFOCmCPPfbeZzOmf4h+bo+18kxQvcTWytc2GVE9/i9kHi5G8cbLO4+Ufxz6V8+wwm8vb2o8t plDNYOQRCwQZiWBzvt4GuG59RGQ7YnRGvynTraCYnsrLeNdYH9JoQp5nOIkiZJfj9G8Agec0 fv/uMSaM1K+s9JOGjt8B5mr9lU35ZwehBtC5gZlPa4R5weH/5UoJMl3yZ+ZfiOQrrZ8RoZWd 86bpJml82XQ+QsaC9/Nut4XpWVTH9Y+lSDX4pZnc/DKbipq/0Te4Y5iXBYoUm9Fii3hojxE4 I4lWapc6+seFvakdOw1C3G0GszlVEFM0OevzXOX6aR/w6BaGpdFLjoH4EweZOUlFuhL7W5m3 +BHNRoVRCy6rO+05qu4G/QzucI7I5y+VG8fkikIITDxK98DGMiGb4CUoNhS0XE3m9xEGuvYa 4wBcz1zYR/cYhpJfFAKFJY5m+TujX76G9FagAvN+exrvC6OnEooiOOF3Nn9I7RmQe1PmUmVv CTe9nnRCRAGLt2PjzGC9xpAg8eWxHirCNlISOzQGvhChEbDm3w1KQUqWkq9uMerr2m+W48cA hlBksYphe1onKCxdfHDWBm/rG+BrwQrcdNaGO0n6ymA0qPRpQ2eAwAsXjNHLdArqsIybTgrz UOS2cPkAyR1t7+YQm7b8a2bxRuVPSUWNmYEaTUzZA0J+cT4oIozgxTMSf5uCKewyNbyHFnYw TqHsSw/jLU7ltMQ2uOw+lWvqy2ojojESEgy/Aq/dmC46gJ0Yqa1aoru7kLUhd5bN5qQRFSFu HkCmuCd4foIAJXLkzaCKM0VFaytz+aINnvbm1EHInU63zGk+nrmcYUO5jh7fR9tKpxdIWSvZ 1LPswRM4pMVJGGtcaJ8f4O2DYIt0LTkEtPmEPvTa7Kif6RMSeNOxwk2DWb44ownuBFEfX0XU XtDTfuRMA==
  • Ironport-hdrordr: A9a23:FZ45cqj641Ma6DTxuQxz5xMY/nBQXiwji2hC6mlwRA09TyX5ra 2TdTogtSMc6QxhPk3I/OrrBEDuexzhHPJOj7X5eI3SPjUO21HYS72Kj7GSoAEIcheWnoJgPO VbAs1D4bXLZmSS5vyKhDVQfexA/DGGmprY+ts3zR1WPH9Xg3cL1XYJNu6ZeHcGNDWvHfACZe OhDlIsnUvcRZwQBP7LfkUtbqz4iPDgsonpWhICDw5P0njzsdv5gISKaCRxx30lIkly/Ys=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jun 22, 2023 at 09:17:36PM +0000, Volodymyr Babchuk wrote:
> 
> Hi Roger,
> 
> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
> 
> > On Wed, Jun 21, 2023 at 10:07:20PM +0000, Volodymyr Babchuk wrote:
> >> 
> >> Hi Roger,
> >> 
> >> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
> >> 
> >> > On Tue, Jun 13, 2023 at 10:32:26AM +0000, Volodymyr Babchuk wrote:
> >> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >> >> 
> >> >> Introduce a per-domain read/write lock to check whether vpci is present,
> >> >> so we are sure there are no accesses to the contents of the vpci struct
> >> >> if not. This lock can be used (and in a few cases is used right away)
> >> >> so that vpci removal can be performed while holding the lock in write
> >> >> mode. Previously such removal could race with vpci_read for example.
> >> >> 
> >> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> >> >> from being removed.
> >> >> 
> >> >> 2. Writing the command register and ROM BAR register may trigger
> >> >> modify_bars to run, which in turn may access multiple pdevs while
> >> >> checking for the existing BAR's overlap. The overlapping check, if done
> >> >> under the read lock, requires vpci->lock to be acquired on both devices
> >> >> being compared, which may produce a deadlock. It is not possible to
> >> >> upgrade read lock to write lock in such a case. So, in order to prevent
> >> >> the deadlock, check which registers are going to be written and acquire
> >> >> the lock in the appropriate mode from the beginning.
> >> >> 
> >> >> All other code, which doesn't lead to pdev->vpci destruction and does 
> >> >> not
> >> >> access multiple pdevs at the same time, can still use a combination of 
> >> >> the
> >> >> read lock and pdev->vpci->lock.
> >> >> 
> >> >> 3. Optimize if ROM BAR write lock required detection by caching offset
> >> >> of the ROM BAR register in vpci->header->rom_reg which depends on
> >> >> header's type.
> >> >> 
> >> >> 4. Reduce locked region in vpci_remove_device as it is now possible
> >> >> to set pdev->vpci to NULL early right after the write lock is acquired.
> >> >> 
> >> >> 5. Reduce locked region in vpci_add_handlers as it is possible to
> >> >> initialize many more fields of the struct vpci before assigning it to
> >> >> pdev->vpci.
> >> >> 
> >> >> 6. vpci_{add|remove}_register are required to be called with the write 
> >> >> lock
> >> >> held, but it is not feasible to add an assert there as it requires
> >> >> struct domain to be passed for that. So, add a comment about this 
> >> >> requirement
> >> >> to these and other functions with the equivalent constraints.
> >> >> 
> >> >> 7. Drop const qualifier where the new rwlock is used and this is 
> >> >> appropriate.
> >> >> 
> >> >> 8. Do not call process_pending_softirqs with any locks held. For that 
> >> >> unlock
> >> >> prior the call and re-acquire the locks after. After re-acquiring the
> >> >> lock there is no need to check if pdev->vpci exists:
> >> >>  - in apply_map because of the context it is called (no race condition
> >> >>    possible)
> >> >>  - for MSI/MSI-X debug code because it is called at the end of
> >> >>    pdev->vpci access and no further access to pdev->vpci is made
> >> >> 
> >> >> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
> >> >> and if so, allow reading or writing the hardware register directly. 
> >> >> This is
> >> >> acceptable as we only deal with Dom0 as of now. Once DomU support is
> >> >> added the write will need to be ignored and read return all 0's for the
> >> >> guests, while Dom0 can still access the registers directly.
> >> >> 
> >> >> 10. Introduce pcidevs_trylock, so there is a possibility to try locking
> >> >> the pcidev's lock.
> >> >> 
> >> >> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
> >> >> while accessing pdevs in vpci code.
> >> >> 
> >> >> 12. This is based on the discussion at [1].
> >> >> 
> >> >> [1] 
> >> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!zPy31CWFWlyC0xhEHiSj6rOPe7RDSjLranI9KZqhG4ssmChJMWvsPLJPQGTcVsnnowZpP8-LaKJkIWIzb8ue0DoYhg$
> >> >>  [lore[.]kernel[.]org]
> >> >> 
> >> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> >> >> Signed-off-by: Oleksandr Andrushchenko 
> >> >> <oleksandr_andrushchenko@xxxxxxxx>
> >> >
> >> > Thanks.
> >> >
> >> > I haven't looked in full detail, but I'm afraid there's an ABBA
> >> > deadlock with the per-domain vpci lock and the pcidevs lock in
> >> > modify_bars() vs  vpci_add_handlers() and vpci_remove_device().
> >> >
> >> > I've made some comments below.
> >> 
> >> Thank you for the review. I believe that it is a good idea to have a
> >> per-domain pdev_list lock. See my answers below.
> >
> > I think it's important that the lock that protects domain->pdev_list
> > must be the same that also protects pdev->vpci, or else you might run
> > into similar ABBA deadlock situations.
> >
> > The problem then could be that in vpci_{read,write} you will take the
> > per-domain pdev lock in read mode in order to get the pdev, and for
> > writes to the command register or the ROM BAR you would have to
> > upgrade such lock to a write lock without dropping it, and we don't
> > have such functionality for rw locks ATM.
> >
> > Maybe just re-starting the function knowing that the lock must be
> > taken in write mode would be a good solution: writes to the command
> > register will already be slow since they are likely to involve
> > modifications to the p2m.
> 
> Looks like modify_bars() is the only cause for this extended lock. I
> know that this was discussed earlier, but can we rework modify_bars to
> not iterate over all the pdevs? We can store copy of all enabled BARs in
> a domain structure, protected by domain->vpci_lock. Something akin to
> 
> struct vpci_bar {
>         list_head list;
>         struct vpci *vpci;
>         unsigned long start;
>         unsigned long end;
>         bool is_rom;
> };

This IMO makes the logic more complicated, as each time a BAR is
updated we would have to change the cached address and size in two
different places.  It's also duplicated data that takes up memory, and
there are system with a non-trivial amount of PCI devices and thus
BARs to track.

I think it's easier to just make the newly introduced per-domain
rwlock also protect the domain's pdev_list (unless I'm missing
something).  AFAICT it would also simplify locking, as such rwlock
protecting the domain->pdev_list will avoid you from having to take
the pcidevs lock in vpci_{read,write} in order to find the device the
access belongs to.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.