[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] x86/msi: fix locking for SR-IOV devices
On 12.08.2024 22:39, Stewart Hildebrand wrote: > In commit 4f78438b45e2 ("vpci: use per-domain PCI lock to protect vpci > structure") a lock moved from allocate_and_map_msi_pirq() to the caller > and changed from pcidevs_lock() to read_lock(&d->pci_lock). However, one > call path wasn't updated to reflect the change, leading to a failed > assertion observed under the following conditions: > > * PV dom0 > * Debug build (debug=y) of Xen > * There is an SR-IOV device in the system with one or more VFs enabled > * Dom0 has loaded the driver for the VF and enabled MSI-X > > (XEN) Assertion 'd || pcidevs_locked()' failed at > drivers/passthrough/pci.c:535 > (XEN) ----[ Xen-4.20-unstable x86_64 debug=y Not tainted ]---- > ... > (XEN) Xen call trace: > (XEN) [<ffff82d040284da8>] R pci_get_pdev+0x4c/0xab > (XEN) [<ffff82d040344f5c>] F arch/x86/msi.c#read_pci_mem_bar+0x58/0x272 > (XEN) [<ffff82d04034530e>] F > arch/x86/msi.c#msix_capability_init+0x198/0x755 > (XEN) [<ffff82d040345dad>] F arch/x86/msi.c#__pci_enable_msix+0x82/0xe8 > (XEN) [<ffff82d0403463e5>] F pci_enable_msi+0x3f/0x78 > (XEN) [<ffff82d04034be2b>] F map_domain_pirq+0x2a4/0x6dc > (XEN) [<ffff82d04034d4d5>] F allocate_and_map_msi_pirq+0x103/0x262 > (XEN) [<ffff82d04035da5d>] F physdev_map_pirq+0x210/0x259 > (XEN) [<ffff82d04035e798>] F do_physdev_op+0x9c3/0x1454 > (XEN) [<ffff82d040329475>] F pv_hypercall+0x5ac/0x6af > (XEN) [<ffff82d0402012d3>] F lstar_enter+0x143/0x150 > > In read_pci_mem_bar(), the VF obtains the struct pci_dev pointer for its > associated PF to access the vf_rlen array. This array is initialized in > pci_add_device() and is only populated in the associated PF's struct > pci_dev. > > Add a link from the VF's struct pci_dev to the associated PF struct > pci_dev, ensuring the PF's struct doesn't get deallocated until all its > VFs have gone away. Access the vf_rlen array via the new link to the PF, > and remove the troublesome call to pci_get_pdev(). > > Add a call to pci_get_pdev() inside the pcidevs_lock()-locked section of > pci_add_device() to set up the link from VF to PF. In case the new > pci_add_device() invocation fails to find the associated PF (returning > NULL), we are no worse off than before: read_pci_mem_bar() will still > return 0 in that case. > > Note that currently the only way for Xen to know if a device is a VF is > if the toolstack tells Xen about it. Using PHYSDEVOP_manage_pci_add for > a VF is not a case that Xen handles. How does the toolstack come into play here? It's still the Dom0 kernel to tell Xen, via PHYSDEVOP_pci_device_add (preferred) or PHYSDEVOP_manage_pci_add_ext (kind of deprecated; PHYSDEVOP_manage_pci_add is even more kind of deprecated). > --- a/xen/arch/x86/msi.c > +++ b/xen/arch/x86/msi.c > @@ -662,34 +662,32 @@ static int msi_capability_init(struct pci_dev *dev, > return 0; > } > > -static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int > vf) > +static u64 read_pci_mem_bar(struct pf_info *pf_info, u16 seg, u8 bus, u8 > slot, > + u8 func, u8 bir, int vf) > { > + pci_sbdf_t sbdf = PCI_SBDF(seg, bus, slot, func); > u8 limit; > u32 addr, base = PCI_BASE_ADDRESS_0; > u64 disp = 0; > > if ( vf >= 0 ) > { > - struct pci_dev *pdev = pci_get_pdev(NULL, > - PCI_SBDF(seg, bus, slot, func)); > unsigned int pos; > uint16_t ctrl, num_vf, offset, stride; > > - if ( !pdev ) > - return 0; > - > - pos = pci_find_ext_capability(pdev->sbdf, PCI_EXT_CAP_ID_SRIOV); > - ctrl = pci_conf_read16(pdev->sbdf, pos + PCI_SRIOV_CTRL); > - num_vf = pci_conf_read16(pdev->sbdf, pos + PCI_SRIOV_NUM_VF); > - offset = pci_conf_read16(pdev->sbdf, pos + PCI_SRIOV_VF_OFFSET); > - stride = pci_conf_read16(pdev->sbdf, pos + PCI_SRIOV_VF_STRIDE); > + pos = pci_find_ext_capability(sbdf, PCI_EXT_CAP_ID_SRIOV); > + ctrl = pci_conf_read16(sbdf, pos + PCI_SRIOV_CTRL); > + num_vf = pci_conf_read16(sbdf, pos + PCI_SRIOV_NUM_VF); > + offset = pci_conf_read16(sbdf, pos + PCI_SRIOV_VF_OFFSET); > + stride = pci_conf_read16(sbdf, pos + PCI_SRIOV_VF_STRIDE); > > if ( !pos || > !(ctrl & PCI_SRIOV_CTRL_VFE) || > !(ctrl & PCI_SRIOV_CTRL_MSE) || > !num_vf || !offset || (num_vf > 1 && !stride) || > bir >= PCI_SRIOV_NUM_BARS || > - !pdev->vf_rlen[bir] ) > + !pf_info || See further down - I think it would be nice if we didn't need this new check. > @@ -813,6 +811,7 @@ static int msix_capability_init(struct pci_dev *dev, > int vf; > paddr_t pba_paddr; > unsigned int pba_offset; > + struct pf_info *pf_info = NULL; Pointer-to-const? > @@ -827,9 +826,12 @@ static int msix_capability_init(struct pci_dev *dev, > pslot = PCI_SLOT(dev->info.physfn.devfn); > pfunc = PCI_FUNC(dev->info.physfn.devfn); > vf = dev->sbdf.bdf; > + if ( dev->virtfn.pf_pdev ) > + pf_info = &dev->virtfn.pf_pdev->physfn; > } > > - table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf); > + table_paddr = read_pci_mem_bar(pf_info, seg, pbus, pslot, pfunc, bir, > + vf); I don't think putting the new arg first is very nice. SBDF should remain first imo. Between the latter arguments I'm not as fussed. > @@ -446,7 +448,27 @@ static void free_pdev(struct pci_seg *pseg, struct > pci_dev *pdev) > > list_del(&pdev->alldevs_list); > pdev_msi_deinit(pdev); > - xfree(pdev); > + > + if ( pdev->info.is_virtfn ) > + { > + struct pci_dev *pf_pdev = pdev->virtfn.pf_pdev; > + > + if ( pf_pdev ) > + { > + list_del(&pdev->virtfn.next); > + if ( pf_pdev->physfn.dealloc_pending && > + list_empty(&pf_pdev->physfn.vf_list) ) > + xfree(pf_pdev); > + } > + xfree(pdev); > + } > + else > + { > + if ( list_empty(&pdev->physfn.vf_list) ) > + xfree(pdev); > + else > + pdev->physfn.dealloc_pending = true; > + } Could I talk you into un-indenting the last if/else by a level, by making the earlier else and "else if()"? One aspect I didn't properly consider when making the suggestion: What if, without all VFs having gone away, the PF is re-added? In that case we would better recycle the existing structure. That's getting a little complicated, so maybe a better approach is to refuse the request (in pci_remove_device()) when the list isn't empty? > @@ -700,10 +722,22 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, > * extended function. > */ > if ( pdev->info.is_virtfn ) > + { > + struct pci_dev *pf_pdev; > + > pdev->info.is_extfn = pf_is_extfn; > + pf_pdev = pci_get_pdev(NULL, > + PCI_SBDF(seg, info->physfn.bus, > + info->physfn.devfn)); > + if ( pf_pdev ) > + { > + pdev->virtfn.pf_pdev = pf_pdev; > + list_add(&pdev->virtfn.next, &pf_pdev->physfn.vf_list); > + } > + } Hmm, yes, we can't really use the result of the earlier pci_get_pdev(), as we drop the lock intermediately. I wonder though if we wouldn't better fail the function if we can't find the PF instance. > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -150,7 +150,17 @@ struct pci_dev { > unsigned int count; > #define PT_FAULT_THRESHOLD 10 > } fault; > - u64 vf_rlen[6]; > + struct pf_info { > + /* Only populated for PFs (info.is_virtfn == false) */ > + uint64_t vf_rlen[PCI_SRIOV_NUM_BARS]; > + struct list_head vf_list; /* List head */ > + bool dealloc_pending; > + } physfn; > + struct { > + /* Only populated for VFs (info.is_virtfn == true) */ > + struct pci_dev *pf_pdev; /* Link from VF to PF */ > + struct list_head next; /* Entry in vf_list */ For doubly-linked lists "next" isn't really a good name. Since both struct variants need such a member, why not use vf_list uniformly? That'll then also lower the significance of my other question: > + } virtfn; Should the two new struct-s perhaps be a union? Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |