[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 4 Feb 2022 15:57:32 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EZG4uaaIYy3T7UgrANn68Yooksm3d6t0ul6ieOOGGNc=; b=Izyx4PXz/VOUXrCyTakWuMsArIroqtrGshgvpBWRVXJmCKatuyS2ejAC52jzwqqHNJjaLjVFuwAyvGwHxzwRISJfWQyZSdmLPj/mMNXAgWTIeh4OTzJrwKTM9tfdjY1e3+XrYUbyfyCvwNdXviQFI0eugi5+yjcG8o/EF2HmlZXQYaGZhPemxkX+1c5Pjt5RvUlvkY6IxxxMvJJgBnIFVIAq5RbRtORnrHriuzL9v5RJM0paix33lSTNgzmt/dBXyJMUU7+4ifaMAiqcxjeMYuugj/W6dw2ttec5xqnjsCd+exbp7BUOKUpIeenZs+UDsfzBHRnpEr0vaD2KEpqCZQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fPIOcOHjuRivSVkz08QDb9/1K+TR2LUtdGMo75jKZURJmWcnFXG34kbJFAcRt/IqX+AjotEphqA4EUqmFxgGab/Wx850SpKb0KesHBXQFfVm93vIn92dF+Atu8+39QQNGBPc3lVYxwVMYGy/LGRwj/Z/EEIinWYxaJbZ7F5gYV7XPdn4ZBY0juw0XDbtBqgTMHsdz/F8BEW+9rMeKqSkYQ58HfHHBYsbxeW40Q7frwmcdO2PX9bcIaqCQN9No+UeMtruHN94LOKpNIZ8Fd9t+5ncCh/q8SlUgQOV57RXd1mN7STVDj40IV1p8ZJPNr63m0IDidx2uzujhz0qV3YcrA==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 04 Feb 2022 14:57:50 +0000
  • Ironport-data: A9a23:wItRea/nA1bfpHC5sWnfDrUDoniTJUtcMsCJ2f8bNWPcYEJGY0x3y WscXmmDb/bcNzGnfNl1a4y+90gG65DSz4BmG1ZopH88E34SpcT7XtnIdU2Y0wF+jyHgoOCLy +1EN7Es+ehtFie0Si9AttENlFEkvU2ybuOU5NXsZ2YhFWeIdA970Ug5w7Rh3tYz6TSEK1jlV e3a8pW31GCNg1aYAkpMg05UgEoy1BhakGpwUm0WPZinjneH/5UmJMt3yZWKB2n5WuFp8tuSH I4v+l0bElTxpH/BAvv9+lryn9ZjrrT6ZWBigVIOM0Sub4QrSoXfHc/XOdJFAXq7hQllkPhg7 Y5Kk4WNQjxzGZPHu9hGcUVUUHtXaPguFL/veRBTsOSWxkzCNXDt3+9vHAc9OohwFuRfWD8Us 6ZCcXZUM07F17neLLGTE4GAguw5K8bmJsUHs2xIxjDFF/c2B5vERs0m4PcGh2tq3JkQTZ4yY eIURB5XPCrrRCEMYFkrEaw/t7yzhFrWJmgwRFW9+vNsvjm7IBZK+KDkLd79atGMA8JPkS6wj 3ja8mHOJwAVPd2S1xKI6nupwOTImEvTUo8ICKex8PIshVSJ33ESEzUfT179qv684mauVtQaJ 0EK9y4Gqakp6FftXtT7Rwe/onOPolgbQdU4O/cz6ByJjLHV5QmZLmEeS3hKb9lOnPExQTsmx 1qYheTDDDZksKCWYX+F/7LSpjS3UQAXJ2IfYS4PTSMe/sLu5oo0i3rnadJuE7W8iNHvLhj2z yqXtyg1h7gVjskj2r2y+BbMhDfEjprUSg844C3HU2Tj6Rl2DKaCY4Gr8lHd4ex3EJeCTlKBs X4HnOCT9OkLS5qKkUSlW/4RFbuk4/KENjz0glN1GZQlsTO39BaekZt4uW8kYh0za4BdJGGvM BS7VR5tCIF7LV7xaoluXomKUeMUxovcTva5WrOOR48bCnRuTzOv8CZrbE+W+mnilkkwjK0yU aumndaQ4WUyUvo+kmfvLwsJ+fpyn31lmzuPLXzu50n/idKjiGippaDp2bdkRsQw9+u6rQrc6 L6z3OPamkwEAIUSjsQ6mLP/zGzmz1BmXfgaSOQNL4ZvxzaK/kl7UJc9Jpt6I+RYc1x9zLugw 51EchYwJKDDrXPGMx6WTXtodaniW51yxVpiY3B3bQ31hSVyOtn0hEv6S3fRVeN8nACE5aUsJ 8Tphu3aWqgfItg502h1gWbBQHxKK03w2FPm09uNazkjZZ9wLzElCfe/FjYDABImV3Lt3eNn+ uXI/lqCHfIrGlQzZO6LOanH5w7g4hAgdBdaAhKgzi97Ix63ruCH6kXZ05cKHi37AU6Sm2LHh 1vKWkpwSCuki9ZdzeQlTJus9u+BO+B/AlBbDy/c67O3PjPd5W2t3clLV+PgQNwXfDqcFHyKa boHwvfiHucAmVoW4YNwH6wylfA15sf1pq8cxQNhRS2ZY1OuA7JmA3+HwcgQ6fEdmu4H4VO7C hCV591XGbSVI8e5QlQfExUoM7aY3vYOlziMsflseBfm5DV69aasWFlJO0XekzRUKbZ4adt3w eootMMMxRa4jx4mboSPgixOrjzeJX0cSaQ38JodBdaz2AYsz1hDZ73aCzP3v87TO4kdbBFyL 2bN1qTYhrlayk7TSFYJFCDAjbhHmJADmBFW11tedV6HrcXI260s1xpL/DVpEgkMlkdb0/h+M 3RAPlFuIfnc5C9hgcVOUjz+GwxFAxHFqEX9x0FQyT/cRkisEGfMMHc8KaCG+0VAqzBQeT1S/ be5zmf5UGm1IJGtj3VqAUM1+eb+SdFR9xHZnJH1FsuIKJA2fD75j/L8fmEPsRbmXZs8iUCvS TOGJwqshXkX7RItnpA=
  • Ironport-hdrordr: A9a23:RfOdr64noJhPUAsVTgPXwVCBI+orL9Y04lQ7vn2ZFiY7TiXIra yTdaoguCMc6AxxZJkh8erwX5VoZUmsj6KdhrNhQItKPTOWw1dASbsN0WKM+UyDJ8STzJ856U 4kSdkDNDSSNykKsS+Z2njALz9I+rDum8rJ9ITjJjVWPHlXgslbnnlE422gYytLrWd9dP4E/M 323Ls5m9PsQwVdUu2LQl0+G8TTrdzCk5zrJTYAGh4c8QGLyRel8qTzHRS01goXF2on+8ZuzU H11yjCoomzufCyzRHRk0fV8pRtgdPkjv9OHtaFhMQ5IijlziyoeINicbufuy1dmpDk1H8a1P 335zswNcV67H3cOkmzvBvWwgHllA0j7nfzoGXo90fLkIjcfnYXGsBBjYVWfl/y8Ew7puxx16 pNwiawq4dXJQmoplWy2/H4EzVR0makq3srluAey1ZFV5EFVbNXpYsDuGtIDZY7Gj7g4oxPKp ggMCjl3ocXTbqmVQGbgoE2q+bcHEjbXy32DnTqg/blkgS/xxtCvg4lLM92pAZ2yHtycegB2w 3+CNUaqFh5dL5jUUtMPpZwfSKJMB2+ffvtChPaHb21LtBOB5ryw6SHlYndotvaP6A18A==
  • Ironport-sdr: EDnuLwiW5DVUk9h5XvklC3s/jjjDU8silNoA9S5sRn1wvagGcJRO9kI/pFQ2oMBiH5l7I2Cq/j gkuHnJ5w02Qkkfz3mQc3gu/bgEG3r0fseAN6Rv/zW+IbeVEj/pf4eeeIKEmk4+m0K7MZKO7mtY CVLyrjUW2LGMYqj5z0WA88cd3axKFUfbR1KvKqzorBrK+vT4nDNjDxgNsbE6conv2jCDem04yN A/nv83mIiBJZQ+fiSh4iDi7Hq/zWPJICxj7XKIhP+2usFwifxWIRDj8fIno5hmE0zSmSym7pxr 4eCGBYOa5vfhu3b27XfdLoTT
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Feb 04, 2022 at 02:43:07PM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 15:06, Roger Pau Monné wrote:
> > On Fri, Feb 04, 2022 at 12:53:20PM +0000, Oleksandr Andrushchenko wrote:
> >>
> >> On 04.02.22 14:47, Jan Beulich wrote:
> >>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
> >>>> On 04.02.22 13:37, Jan Beulich wrote:
> >>>>> On 04.02.2022 12:13, Roger Pau Monné wrote:
> >>>>>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> >>>>>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> >>>>>>>> On 04.02.22 11:15, Jan Beulich wrote:
> >>>>>>>>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> >>>>>>>>>> On 04.02.22 09:52, Jan Beulich wrote:
> >>>>>>>>>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >>>>>>>>>>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
> >>>>>>>>>>>> *pdev, uint16_t cmd, bool rom_only)
> >>>>>>>>>>>>                       continue;
> >>>>>>>>>>>>               }
> >>>>>>>>>>>>       
> >>>>>>>>>>>> +        spin_lock(&tmp->vpci_lock);
> >>>>>>>>>>>> +        if ( !tmp->vpci )
> >>>>>>>>>>>> +        {
> >>>>>>>>>>>> +            spin_unlock(&tmp->vpci_lock);
> >>>>>>>>>>>> +            continue;
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>>               for ( i = 0; i < 
> >>>>>>>>>>>> ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> >>>>>>>>>>>>               {
> >>>>>>>>>>>>                   const struct vpci_bar *bar = 
> >>>>>>>>>>>> &tmp->vpci->header.bars[i];
> >>>>>>>>>>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct 
> >>>>>>>>>>>> pci_dev *pdev, uint16_t cmd, bool rom_only)
> >>>>>>>>>>>>                   rc = rangeset_remove_range(mem, start, end);
> >>>>>>>>>>>>                   if ( rc )
> >>>>>>>>>>>>                   {
> >>>>>>>>>>>> +                spin_unlock(&tmp->vpci_lock);
> >>>>>>>>>>>>                       printk(XENLOG_G_WARNING "Failed to remove 
> >>>>>>>>>>>> [%lx, %lx]: %d\n",
> >>>>>>>>>>>>                              start, end, rc);
> >>>>>>>>>>>>                       rangeset_destroy(mem);
> >>>>>>>>>>>>                       return rc;
> >>>>>>>>>>>>                   }
> >>>>>>>>>>>>               }
> >>>>>>>>>>>> +        spin_unlock(&tmp->vpci_lock);
> >>>>>>>>>>>>           }
> >>>>>>>>>>> At the first glance this simply looks like another unjustified 
> >>>>>>>>>>> (in the
> >>>>>>>>>>> description) change, as you're not converting anything here but 
> >>>>>>>>>>> you
> >>>>>>>>>>> actually add locking (and I realize this was there before, so I'm 
> >>>>>>>>>>> sorry
> >>>>>>>>>>> for not pointing this out earlier).
> >>>>>>>>>> Well, I thought that the description already has "...the lock can 
> >>>>>>>>>> be
> >>>>>>>>>> used (and in a few cases is used right away) to check whether vpci
> >>>>>>>>>> is present" and this is enough for such uses as here.
> >>>>>>>>>>>       But then I wonder whether you
> >>>>>>>>>>> actually tested this, since I can't help getting the impression 
> >>>>>>>>>>> that
> >>>>>>>>>>> you're introducing a live-lock: The function is called from 
> >>>>>>>>>>> cmd_write()
> >>>>>>>>>>> and rom_write(), which in turn are called out of vpci_write(). 
> >>>>>>>>>>> Yet that
> >>>>>>>>>>> function already holds the lock, and the lock is not (currently)
> >>>>>>>>>>> recursive. (For the 3rd caller of the function - init_bars() - 
> >>>>>>>>>>> otoh
> >>>>>>>>>>> the locking looks to be entirely unnecessary.)
> >>>>>>>>>> Well, you are correct: if tmp != pdev then it is correct to acquire
> >>>>>>>>>> the lock. But if tmp == pdev and rom_only == true
> >>>>>>>>>> then we'll deadlock.
> >>>>>>>>>>
> >>>>>>>>>> It seems we need to have the locking conditional, e.g. only lock
> >>>>>>>>>> if tmp != pdev
> >>>>>>>>> Which will address the live-lock, but introduce ABBA deadlock 
> >>>>>>>>> potential
> >>>>>>>>> between the two locks.
> >>>>>>>> I am not sure I can suggest a better solution here
> >>>>>>>> @Roger, @Jan, could you please help here?
> >>>>>>> Well, first of all I'd like to mention that while it may have been 
> >>>>>>> okay to
> >>>>>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> >>>>>>> dealing
> >>>>>>> with DomU-s' lists of PCI devices. The requirement really applies to 
> >>>>>>> the
> >>>>>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> >>>>>>> there it probably wants to be a try-lock.
> >>>>>>>
> >>>>>>> Next I'd like to point out that here we have the still pending issue 
> >>>>>>> of
> >>>>>>> how to deal with hidden devices, which Dom0 can access. See my RFC 
> >>>>>>> patch
> >>>>>>> "vPCI: account for hidden devices in modify_bars()". Whatever the 
> >>>>>>> solution
> >>>>>>> here, I think it wants to at least account for the extra need there.
> >>>>>> Yes, sorry, I should take care of that.
> >>>>>>
> >>>>>>> Now it is quite clear that pcidevs_lock isn't going to help with 
> >>>>>>> avoiding
> >>>>>>> the deadlock, as it's imo not an option at all to acquire that lock
> >>>>>>> everywhere else you access ->vpci (or else the vpci lock itself would 
> >>>>>>> be
> >>>>>>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
> >>>>>>> would acquire it in read mode, and here you'd acquire it in write 
> >>>>>>> mode (in
> >>>>>>> the former case around the vpci lock, while in the latter case there 
> >>>>>>> may
> >>>>>>> then not be any need to acquire the individual vpci locks at all). 
> >>>>>>> FTAOD:
> >>>>>>> I haven't fully thought through all implications (and hence whether 
> >>>>>>> this is
> >>>>>>> viable in the first place); I expect you will, documenting what you've
> >>>>>>> found in the resulting patch description. Of course the double lock
> >>>>>>> acquire/release would then likely want hiding in helper functions.
> >>>>>> I've been also thinking about this, and whether it's really worth to
> >>>>>> have a per-device lock rather than a per-domain one that protects all
> >>>>>> vpci regions of the devices assigned to the domain.
> >>>>>>
> >>>>>> The OS is likely to serialize accesses to the PCI config space anyway,
> >>>>>> and the only place I could see a benefit of having per-device locks is
> >>>>>> in the handling of MSI-X tables, as the handling of the mask bit is
> >>>>>> likely very performance sensitive, so adding a per-domain lock there
> >>>>>> could be a bottleneck.
> >>>>> Hmm, with method 1 accesses serializing globally is basically
> >>>>> unavoidable, but with MMCFG I see no reason why OSes may not (move
> >>>>> to) permit(ting) parallel accesses, with serialization perhaps done
> >>>>> only at device level. See our own pci_config_lock, which applies to
> >>>>> only method 1 accesses; we don't look to be serializing MMCFG
> >>>>> accesses at all.
> >>>>>
> >>>>>> We could alternatively do a per-domain rwlock for vpci and special case
> >>>>>> the MSI-X area to also have a per-device specific lock. At which point
> >>>>>> it becomes fairly similar to what you propose.
> >>>> @Jan, @Roger
> >>>>
> >>>> 1. d->vpci_lock - rwlock <- this protects vpci
> >>>> 2. pdev->vpci->msix_tbl_lock - rwlock <- this protects MSI-X tables
> >>>> or should it better be pdev->msix_tbl_lock as MSI-X tables don't
> >>>> really depend on vPCI?
> >>> If so, perhaps indeed better the latter. But as said in reply to Roger,
> >>> I'm not convinced (yet) that doing away with the per-device lock is a
> >>> good move. As said there - we're ourselves doing fully parallel MMCFG
> >>> accesses, so OSes ought to be fine to do so, too.
> >> But with pdev->vpci_lock we face ABBA...
> > I think it would be easier to start with a per-domain rwlock that
> > guarantees pdev->vpci cannot be removed under our feet. This would be
> > taken in read mode in vpci_{read,write} and in write mode when
> > removing a device from a domain.
> >
> > Then there are also other issues regarding vPCI locking that need to
> > be fixed, but that lock would likely be a start.
> Or let's see the problem at a different angle: this is the only place
> which breaks the use of pdev->vpci_lock. Because all other places
> do not try to acquire the lock of any two devices at a time.
> So, what if we re-work the offending piece of code instead?
> That way we do not break parallel access and have the lock per-device
> which might also be a plus.
> 
> By re-work I mean, that instead of reading already mapped regions
> from tmp we can employ a d->pci_mapped_regions range set which
> will hold all the already mapped ranges. And when it is needed to access
> that range set we use pcidevs_lock which seems to be rare.
> So, modify_bars will rely on pdev->vpci_lock + pcidevs_lock and
> ABBA won't be possible at all.

Sadly that won't replace the usage of the loop in modify_bars. This is
not (exclusively) done in order to prevent mapping the same region
multiple times, but rather to prevent unmapping of regions as long as
there's an enabled BAR that's using it.

If you wanted to use something like d->pci_mapped_regions it would
have to keep reference counts to regions, in order to know when a
mapping is no longer required by any BAR on the system with memory
decoding enabled.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.