[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 2/3] xen/pci: introduce PF<->VF links


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Date: Mon, 11 Nov 2024 15:07:28 -0500
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=citrix.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GnJ9+w4Ov1p9IQ2w2FJK4S7TT8CcvFCcO9vMgt4Ceu4=; b=cYBNoVNIKsP2vEhO3RiPdaScl7O5Dyl7sV/tuUtg76uW7EjLDmLYGGIRzfQ7+iI+CdUu3vprnWDxMPtkbL91rthVOPkbDb7lkB8cKmgDsPMhAvGMotAHb4D25oVxTTkqKgne0HreOJNZITDvpPlaDEWIPlipwD7c3/DR0/8lPEsV0kPJv0zaSnwNsMZPbpnxt21PrNdo7jJtadpRv2c69wL3cSkWrIxV+Ys6qItyTcykWsex+5L02W0LziLtURmvWiLGG7GX9i0o6RQWbIvTUvD9ww6+a3vEiNsVUNMQJE80sFkc56MP62JWDQsNuv7GXbKsNl+hfJz19tEha2sJMA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=lwxUNNhSpMZVJJe9zgS8fhjxwqov4YSvYOH6HPIyJjlfmSjNT7WqYDUlfbDDMaRXmYsQI8oG5YTLVBopG2mx4JiLR733wcbOUh3xuU5QlIdt5j8EF87VFR3/N52bIvo9ra+mysJIXjShi8sEYwAFtppJVSbuDO7R7Ib7RMFHafd7Ry70FaeFj1wqFeiPwBs2Tt3767hHbI0xCDibJwX544nuLzdaMNfZZ6X/1ZMUJFCb5STSC/Z/HPFS/pwzJf/n+rOAMP4+GVAe+kvP9m7DF9sDYgu7CF+RsrxrVX/rBsRSaljhVo6W9WOWbXqA2g0cRuJBWA4gsv8jnchsalcj6Q==
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, "Stefano Stabellini" <sstabellini@xxxxxxxxxx>
  • Delivery-date: Mon, 11 Nov 2024 20:08:05 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10/28/24 14:41, Roger Pau Monné wrote:
> On Fri, Oct 18, 2024 at 04:39:09PM -0400, Stewart Hildebrand wrote:
>> Add links between a VF's struct pci_dev and its associated PF struct
>> pci_dev. Move the calls to pci_get_pdev()/pci_add_device() down to avoid
>> dropping and re-acquiring the pcidevs_lock().
>>
>> During PF removal, unlink VF from PF and mark the VF broken. As before,
>> VFs may exist without a corresponding PF, although now only with
>> pdev->broken = true.
>>
>> The hardware domain is expected to remove the associated VFs before
>> removing the PF. Print a warning in case a PF is removed with associated
>> VFs still present.
>>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
>> ---
>> Candidate for backport to 4.19 (the next patch depends on this one)
>>
>> v5->v6:
>> * move printk() before ASSERT_UNREACHABLE()
>> * warn about PF removal with VFs still present
>> * clarify commit message
>>
>> v4->v5:
>> * new patch, split from ("x86/msi: fix locking for SR-IOV devices")
>> * move INIT_LIST_HEAD(&pdev->vf_list); earlier
>> * collapse struct list_head instances
>> * retain error code from pci_add_device()
>> * unlink (and mark broken) VFs instead of removing them
>> * const-ify VF->PF link
>> ---
>>  xen/drivers/passthrough/pci.c | 76 ++++++++++++++++++++++++++++-------
>>  xen/include/xen/pci.h         | 10 +++++
>>  2 files changed, 72 insertions(+), 14 deletions(-)
>>
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 74d3895e1ef6..fe31255b1207 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -333,6 +333,8 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, 
>> u8 bus, u8 devfn)
>>      *((u8*) &pdev->devfn) = devfn;
>>      pdev->domain = NULL;
>>  
>> +    INIT_LIST_HEAD(&pdev->vf_list);
>> +
>>      arch_pci_init_pdev(pdev);
>>  
>>      rc = pdev_msi_init(pdev);
>> @@ -449,6 +451,10 @@ static void free_pdev(struct pci_seg *pseg, struct 
>> pci_dev *pdev)
>>  
>>      list_del(&pdev->alldevs_list);
>>      pdev_msi_deinit(pdev);
>> +
>> +    if ( pdev->info.is_virtfn && pdev->virtfn.pf_pdev )
> 
> Shouldn't having pdev->info.is_virtfn set already ensure that
> pdev->virtfn.pf_pdev != NULL?

In the current rev, the possibility exists, however unlikely, that a
*buggy* dom0 may remove the PF before removing the VFs. In this case a
VF would exist without a corresponding PF, and thus pdev->virtfn.pf_pdev
is NULL.

For the next rev, you're right, it'll be back to the situation where a
VF can only exist with an associated PF.

>> +        list_del(&pdev->vf_list);
>> +
>>      xfree(pdev);
>>  }
>>  
>> @@ -656,24 +662,11 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>      unsigned int slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn);
>>      const char *type;
>>      int ret;
>> -    bool pf_is_extfn = false;
>>  
>>      if ( !info )
>>          type = "device";
>>      else if ( info->is_virtfn )
>> -    {
>> -        pcidevs_lock();
>> -        pdev = pci_get_pdev(NULL,
>> -                            PCI_SBDF(seg, info->physfn.bus,
>> -                                     info->physfn.devfn));
>> -        if ( pdev )
>> -            pf_is_extfn = pdev->info.is_extfn;
>> -        pcidevs_unlock();
>> -        if ( !pdev )
>> -            pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
>> -                           NULL, node);
>>          type = "virtual function";
>> -    }
>>      else if ( info->is_extfn )
>>          type = "extended function";
>>      else
>> @@ -703,7 +696,44 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>           * extended function.
>>           */
>>          if ( pdev->info.is_virtfn )
>> -            pdev->info.is_extfn = pf_is_extfn;
>> +        {
>> +            struct pci_dev *pf_pdev;
> 
> This could be const?

No, as we are doing this below:
    list_add(&pdev->vf_list, &pf_pdev->vf_list);

>> +
>> +            pf_pdev = pci_get_pdev(NULL,
>> +                                   PCI_SBDF(seg, info->physfn.bus,
>> +                                            info->physfn.devfn));
> 
> You can probably initialize at declaration?

OK

>> +
>> +            if ( !pf_pdev )
> 
> Is this even feasible during correct operation?

No, I don't think so.

> IOW: shouldn't the PF
> always be added first, so that SR-IOV can be enabled and the VFs added
> afterwards?

Yes, I think you're right.

> I see previous code also catered for VFs being added without the PF
> being present, so I assume there was some need for this.

This is exactly the source of the confusion on my part. In the removal
path, the consensus seems to be that removing a PF with VFs still
present indicates an error. Then shouldn't the opposite also be true?
Adding a VF without a PF should also indicate an error.

I see the PF-adding logic was added in 942a6f1376d8 ("x86/PCI-MSI:
properly determine VF BAR values"). Searching the mailing list archives
didn't reveal much about it [0].

[0] 
https://lore.kernel.org/xen-devel/4E3FC6E102000078000501CA@xxxxxxxxxxxxxxxxxxxx/

The only time I've observed this path being taken is by manually
calling PHYSDEVOP_pci_device_add. I've resorted to calling
PHYSDEVOP_pci_device_{remove,add} from userspace in order to test this,
because the Linux kernel doesn't behave this way.

I can't think of a good rationale for catering to VFs being added
without a PF, so I'll turn it into an error for the next rev.

>> +            {
>> +                ret = pci_add_device(seg, info->physfn.bus, 
>> info->physfn.devfn,
>> +                                     NULL, node);
>> +                if ( ret )
>> +                {
>> +                    printk(XENLOG_WARNING "Failed to add SR-IOV device PF 
>> %pp for VF %pp\n",
> 
> Could you split this to make the line a bit shorter?
> 
>                        printk(XENLOG_WARNING
>                             "Failed to add SR-IOV device PF %pp for VF %pp\n",
> 
> Same below.
> 
>> +                           &PCI_SBDF(seg, info->physfn.bus, 
>> info->physfn.devfn),
>> +                           &pdev->sbdf);
>> +                    free_pdev(pseg, pdev);
>> +                    goto out;
>> +                }
>> +                pf_pdev = pci_get_pdev(NULL,
>> +                                       PCI_SBDF(seg, info->physfn.bus,
>> +                                                info->physfn.devfn));
>> +                if ( !pf_pdev )
>> +                {
>> +                    printk(XENLOG_ERR "Failed to find SR-IOV device PF %pp 
>> for VF %pp\n",
>> +                           &PCI_SBDF(seg, info->physfn.bus, 
>> info->physfn.devfn),
>> +                           &pdev->sbdf);
> 
> If you want to add an error message here, I think it should mention
> the fact this state is not expected:
> 
> "Inconsistent PCI state: failed to find newly added PF %pp for VF %pp\n"
> 
>> +                    ASSERT_UNREACHABLE();
>> +                    free_pdev(pseg, pdev);
>> +                    ret = -EILSEQ;
>> +                    goto out;
>> +                }
>> +            }
>> +
>> +            pdev->info.is_extfn = pf_pdev->info.is_extfn;
>> +            pdev->virtfn.pf_pdev = pf_pdev;
>> +            list_add(&pdev->vf_list, &pf_pdev->vf_list);
>> +        }
>>      }
>>  
>>      if ( !pdev->info.is_virtfn && !pdev->vf_rlen[0] )
>> @@ -821,6 +851,24 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>>          if ( pdev->bus == bus && pdev->devfn == devfn )
>>          {
>> +            if ( !pdev->info.is_virtfn )
> 
> Given we have no field to mark a device as a PF, we could check that
> pdev->vf_list is not empty, and by doing so the warn_stale_vfs
> variable could be dropped?
> 
> if ( !pdev->info.is_virtfn && !list_empty(&pdev->vf_list) )
> {
>     struct pci_dev *vf_pdev;
> 
>     while ( (vf_pdev = list_first_entry_or_null(&pdev->vf_list,
>                                                 struct pci_dev,
>                                               vf_list)) != NULL )
>     {
>         list_del(&vf_pdev->vf_list);
>         vf_pdev->virtfn.pf_pdev = NULL;
>         vf_pdev->broken = true;
>     }
> 
>     printk(XENLOG_WARNING "PCI SR-IOV PF %pp removed with VFs still 
> present\n",
>            &pdev->sbdf);
> }

Yeah. Given that the consensus is leaning toward keeping the PF and
returning an error, here's my suggestion:

    if ( !pdev->info.is_virtfn && !list_empty(&pdev->vf_list) )
    {
        struct pci_dev *vf_pdev;

        list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
            vf_pdev->broken = true;

        pdev->broken = true;

        printk(XENLOG_WARNING
               "Attempted to remove PCI SR-IOV PF %pp with VFs still present\n",
               &pdev->sbdf);

        ret = -EBUSY;
        break;
    }

>> +            {
>> +                struct pci_dev *vf_pdev, *tmp;
>> +                bool warn_stale_vfs = false;
>> +
>> +                list_for_each_entry_safe(vf_pdev, tmp, &pdev->vf_list, 
>> vf_list)
>> +                {
>> +                    list_del(&vf_pdev->vf_list);
>> +                    vf_pdev->virtfn.pf_pdev = NULL;
>> +                    vf_pdev->broken = true;
>> +                    warn_stale_vfs = true;
>> +                }
>> +
>> +                if ( warn_stale_vfs )
>> +                    printk(XENLOG_WARNING "PCI SR-IOV PF %pp removed with 
>> VFs still present\n",
>> +                           &pdev->sbdf);
>> +            }
>> +
>>              if ( pdev->domain )
>>              {
>>                  write_lock(&pdev->domain->pci_lock);
>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>> index ef56e80651d6..2ea168d5f914 100644
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -153,7 +153,17 @@ struct pci_dev {
>>          unsigned int count;
>>  #define PT_FAULT_THRESHOLD 10
>>      } fault;
>> +
>> +    /*
>> +     * List head if info.is_virtfn == false
>> +     * List entry if info.is_virtfn == true
>> +     */
>> +    struct list_head vf_list;
>>      u64 vf_rlen[6];
>> +    struct {
>> +        /* Only populated for VFs (info.is_virtfn == true) */
> 
> All comments here (specially the first ones) would better use PF and
> VF consistently, rather than referring to other fields in the struct.
> Specially because the fields can change names and the comments would
> then become stale.

OK

>> +        const struct pci_dev *pf_pdev;        /* Link from VF to PF */
>> +    } virtfn;
> 
> I'm unsure you need an outer virtfn struct, as it's only one field in
> this patch?  Maybe more fields gets added by further patches?

Right. There are no more fields to be added, so there's no need. It was
leftover from a previous rev when vf_list was split.

> 
> Thanks, Roger.




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.