[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On Fri, Feb 11, 2022 at 07:27:39AM +0000, Oleksandr Andrushchenko wrote: > Hi, Roger! > > On 10.02.22 18:16, Roger Pau Monné wrote: > > On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote: > >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> > >> > >> Introduce a per-domain read/write lock to check whether vpci is present, > >> so we are sure there are no accesses to the contents of the vpci struct > >> if not. This lock can be used (and in a few cases is used right away) > >> so that vpci removal can be performed while holding the lock in write > >> mode. Previously such removal could race with vpci_read for example. > > Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt > > pci_remove_device, and likely when vPCI gets also used in > > {de}assign_device I think. > Yes, this is indeed an issue, but I was not trying to solve it in > context of vPCI locking yet. I think we should discuss how do > we approach pdev locking, so I can create a patch for that. > that being said, I would like not to solve pdev in this patch yet > > ...I do understand we do want to avoid that, but at the moment > a single reliable way for making sure pdev is alive seems to > be pcidevs_lock.... I think we will need to make pcidevs_lock a rwlock and take it in read mode for pci_get_pdev_by_domain. We didn't have this scenario before where PCI emulation is done in the hypervisor, and hence the locking around those data structures has not been designed for those use-cases. > > > >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure > >> from being removed. > >> > >> 2. Writing the command register and ROM BAR register may trigger > >> modify_bars to run, which in turn may access multiple pdevs while > >> checking for the existing BAR's overlap. The overlapping check, if done > >> under the read lock, requires vpci->lock to be acquired on both devices > >> being compared, which may produce a deadlock. It is not possible to > >> upgrade read lock to write lock in such a case. So, in order to prevent > >> the deadlock, check which registers are going to be written and acquire > >> the lock in the appropriate mode from the beginning. > >> > >> All other code, which doesn't lead to pdev->vpci destruction and does not > >> access multiple pdevs at the same time, can still use a combination of the > >> read lock and pdev->vpci->lock. > >> > >> 3. Optimize if ROM BAR write lock required detection by caching offset > >> of the ROM BAR register in vpci->header->rom_reg which depends on > >> header's type. > >> > >> 4. Reduce locked region in vpci_remove_device as it is now possible > >> to set pdev->vpci to NULL early right after the write lock is acquired. > >> > >> 5. Reduce locked region in vpci_add_handlers as it is possible to > >> initialize many more fields of the struct vpci before assigning it to > >> pdev->vpci. > >> > >> 6. vpci_{add|remove}_register are required to be called with the write lock > >> held, but it is not feasible to add an assert there as it requires > >> struct domain to be passed for that. So, add a comment about this > >> requirement > >> to these and other functions with the equivalent constraints. > >> > >> 7. Drop const qualifier where the new rwlock is used and this is > >> appropriate. > >> > >> 8. This is based on the discussion at [1]. > >> > >> [1] > >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$ > >> [lore[.]kernel[.]org] > >> > >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx> > >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> > >> > >> --- > >> This was checked on x86: with and without PVH Dom0. > >> --- > >> xen/arch/x86/hvm/vmsi.c | 2 + > >> xen/common/domain.c | 3 + > >> xen/drivers/vpci/header.c | 8 +++ > >> xen/drivers/vpci/msi.c | 8 ++- > >> xen/drivers/vpci/msix.c | 40 +++++++++++-- > >> xen/drivers/vpci/vpci.c | 114 ++++++++++++++++++++++++++++---------- > >> xen/include/xen/sched.h | 3 + > >> xen/include/xen/vpci.h | 2 + > >> 8 files changed, 146 insertions(+), 34 deletions(-) > >> > >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > >> index 13e2a190b439..351cb968a423 100644 > >> --- a/xen/arch/x86/hvm/vmsi.c > >> +++ b/xen/arch/x86/hvm/vmsi.c > >> @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > >> { > >> unsigned int i; > >> > >> + ASSERT(!!rw_is_locked(&msix->pdev->domain->vpci_rwlock)); > > ^ no need for the double negation. > Ok, will update all asserts which use !! > > > > Also this asserts that the lock is taken, but could be by a different > > pCPU. I guess it's better than nothing. > Fair enough. Do you still want the asserts or should I remove them? Likely fine to leave them. > > > >> + > >> for ( i = 0; i < msix->max_entries; i++ ) > >> { > >> const struct vpci_msix_entry *entry = &msix->entries[i]; > > Since this function is now called with the per-domain rwlock read > > locked it's likely not appropriate to call process_pending_softirqs > > while holding such lock (check below). > You are right, as it is possible that: > > process_pending_softirqs -> vpci_process_pending -> read_lock > > Even more, vpci_process_pending may also > > read_unlock -> vpci_remove_device -> write_lock > > in its error path. So, any invocation of process_pending_softirqs > must not hold d->vpci_rwlock at least. > > And also we need to check that pdev->vpci was not removed > in between or *re-created* > > > > We will likely need to re-iterate over the list of pdevs assigned to > > the domain and assert that the pdev is still assigned to the same > > domain. > So, do you mean a pattern like the below should be used at all > places where we need to call process_pending_softirqs? > > read_unlock > process_pending_softirqs > read_lock > pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); > if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) ) > <continue processing> Something along those lines. You likely need to continue iterate using for_each_pdev. > >> +{ > >> + /* > >> + * Writing the command register and ROM BAR register may trigger > >> + * modify_bars to run which in turn may access multiple pdevs while > >> + * checking for the existing BAR's overlap. The overlapping check, if > >> done > >> + * under the read lock, requires vpci->lock to be acquired on both > >> devices > >> + * being compared, which may produce a deadlock. It is not possible to > >> + * upgrade read lock to write lock in such a case. So, in order to > >> prevent > >> + * the deadlock, check which registers are going to be written and > >> acquire > >> + * the lock in the appropriate mode from the beginning. > >> + */ > >> + if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) ) > >> + return true; > >> + > >> + if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) ) > > No need for the comparison if rom_reg is unset. Also you can OR both > > conditions into a single if. > If we open code vpci_offset_cmp with a single if all this is going > to be a bit clumsy: > > if ( r1_offset < r2_offset + r2_size && > r2_offset < r1_offset + r1_size ) > return 0; > This is a single check. > Now we need to check two registers with the code above and > also check that pdev->vpci->header.rom_reg != 0 > > I think it would be more readable if we have a tiny helper function > > static bool vpci_offset_cmp(unsigned int r1_offset, unsigned int r1_size, > unsigned int r2_offset, unsigned int r2_size) > { > /* Return 0 if registers overlap. */ > if ( r1_offset < r2_offset + r2_size && > r2_offset < r1_offset + r1_size ) > return false; > return true; > } > > So, then we can have something like > > static bool vpci_header_write_lock(const struct pci_dev *pdev, > unsigned int start, unsigned int size) > { > if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) || > (pdev->vpci->header.rom_reg && > !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4)) ) > return true; > > return false; > } Just create an 'overlaps' static function in header.c. Thanks, Roger. >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |