[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 3 Nov 2021 09:41:17 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=m73GNml3GEXj5c+aoGrFbSvAaBjXR7mRF/9kTJuEFm8=; b=Cf0/wT3XGtyLtIyzTlONwZNcc/2FXEH/sqqbjg4ix7aoQT4gBEiYoJk6KDT5l4Q8SVr6w+nOUWDIJdZdi3/XveqTkAoMLuaAWvhzPO4vgj0S0yEOtqPLlMB7YvJPOhxvBeDvhAu2fsf56oYavcS+10hvF4ibdcsXL3UHTtZClSOtLeu89fRwx0OcI1OgUDVaryusLHaCqV2MyaGjpfrSe9PjveITeN5Ud/8QKizNX528bfqG7Y0Nms4jxlSjI3iQTk1PoetXttkqyhxiik6jM4Z6NJqVmg57DP2TmhlpjGUsj93PPGMIQiL7gSgWDKCoiuAZG+9MwynsiCW8i6Av8A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=b4+dcMu/2eMH8F4NYsfYKNFIpLcJm7HxIbDuqyULLSYzEbQ/MVrILswBVmb+MSjGvazborFCNNg3KKmvUjjlPcCutBIIX4d9tmqti4NheYpX/XqPn7HALb/5xaTzZ6WF2FYW1IYcBvIlLfEqndv/4UjuaiA0rh/8H8bNHEcnvvQ/aeuP4C6iG2GuUYorCm1cGVqaHao14ciMCBUbhz68C0m1xqSK5rIcrvIXjfLXvFlVCiuMuO7Qmm6PC3Dd+K3KghaOQSzsV0aTdXctUUFWvU1/c0QMNAOhnEZUF2V3HmuDSyH3bQcCSl7jEAh6vilKQey6QYrwrlMZA/Az35+avQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Wed, 03 Nov 2021 08:41:33 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03.11.2021 07:34, Oleksandr Andrushchenko wrote:
> Hi, Roger
> 
> On 26.10.21 14:33, Roger Pau Monné wrote:
>> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>
>>> Assign SBDF to the PCI devices being passed through with bus 0.
>>> The resulting topology is where PCIe devices reside on the bus 0 of the
>>> root complex itself (embedded endpoints).
>>> This implementation is limited to 32 devices which are allowed on
>>> a single PCI bus.
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>
>>> ---
>>> Since v2:
>>>   - remove casts that are (a) malformed and (b) unnecessary
>>>   - add new line for better readability
>>>   - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>>      functions are now completely gated with this config
>>>   - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> New in v2
>>> ---
>>>   xen/common/domain.c           |  3 ++
>>>   xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>>>   xen/drivers/vpci/vpci.c       | 14 +++++++-
>>>   xen/include/xen/pci.h         | 22 +++++++++++++
>>>   xen/include/xen/sched.h       |  8 +++++
>>>   5 files changed, 106 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>> index 40d67ec34232..e0170087612d 100644
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>>>   
>>>   #ifdef CONFIG_HAS_PCI
>>>       INIT_LIST_HEAD(&d->pdev_list);
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +    INIT_LIST_HEAD(&d->vdev_list);
>>> +#endif
>>>   #endif
>>>   
>>>       /* All error paths can depend on the above setup. */
>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>>> index 805ab86ed555..5b963d75d1ba 100644
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>       return ret;
>>>   }
>>>   
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>> +                                                const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>> +        if ( vdev->pdev == pdev )
>>> +            return vdev;
>>> +    return NULL;
>>> +}
>>> +
>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>>> +
>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>> +    if ( d->vpci_dev_next > 31 )
>>> +        return -ENOSPC;
>>> +
>>> +    vdev = xzalloc(struct vpci_dev);
>>> +    if ( !vdev )
>>> +        return -ENOMEM;
>>> +
>>> +    /* We emulate a single host bridge for the guest, so segment is always 
>>> 0. */
>>> +    vdev->seg = 0;
>>> +
>>> +    /*
>>> +     * The bus number is set to 0, so virtual devices are seen
>>> +     * as embedded endpoints behind the root complex.
>>> +     */
>>> +    vdev->bus = 0;
>>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>> This would likely be better as a bitmap where you set the bits of
>> in-use slots. Then you can use find_first_bit or similar to get a free
>> slot.
>>
>> Long term you might want to allow the caller to provide a pre-selected
>> slot, as it's possible for users to request the device to appear at a
>> specific slot on the emulated bus.
>>
>>> +
>>> +    vdev->pdev = pdev;
>>> +    vdev->domain = d;
>>> +
>>> +    pcidevs_lock();
>>> +    list_add_tail(&vdev->list, &d->vdev_list);
>>> +    pcidevs_unlock();
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    pcidevs_lock();
>>> +    vdev = pci_find_virtual_device(d, pdev);
>>> +    if ( vdev )
>>> +        list_del(&vdev->list);
>>> +    pcidevs_unlock();
>>> +    xfree(vdev);
>>> +    return 0;
>>> +}
>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>> +
>>>   /* Caller should hold the pcidevs_lock */
>>>   static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>>                              uint8_t devfn)
>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>> index 702f7b5d5dda..d787f13e679e 100644
>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>   /* Notify vPCI that device is assigned to guest. */
>>>   int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>   {
>>> +    int rc;
>>> +
>>>       /* It only makes sense to assign for hwdom or guest domain. */
>>>       if ( is_system_domain(d) || !has_vpci(d) )
>>>           return 0;
>>>   
>>> -    return vpci_bar_add_handlers(d, dev);
>>> +    rc = vpci_bar_add_handlers(d, dev);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>> +    return pci_add_virtual_device(d, dev);
>>>   }
>>>   
>>>   /* Notify vPCI that device is de-assigned from guest. */
>>>   int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>   {
>>> +    int rc;
>>> +
>>>       /* It only makes sense to de-assign from hwdom or guest domain. */
>>>       if ( is_system_domain(d) || !has_vpci(d) )
>>>           return 0;
>>>   
>>> +    rc = pci_remove_virtual_device(d, dev);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>>       return vpci_bar_remove_handlers(d, dev);
>>>   }
>>>   #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>>> index 43b8a0817076..33033a3a8f8d 100644
>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -137,6 +137,24 @@ struct pci_dev {
>>>       struct vpci *vpci;
>>>   };
>>>   
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +struct vpci_dev {
>>> +    struct list_head list;
>>> +    /* Physical PCI device this virtual device is connected to. */
>>> +    const struct pci_dev *pdev;
>>> +    /* Virtual SBDF of the device. */
>>> +    union {
>>> +        struct {
>>> +            uint8_t devfn;
>>> +            uint8_t bus;
>>> +            uint16_t seg;
>>> +        };
>>> +        pci_sbdf_t sbdf;
>>> +    };
>>> +    struct domain *domain;
>>> +};
>>> +#endif
>> I wonder whether this is strictly needed. Won't it be enough to store
>> the virtual (ie: guest) sbdf inside the existing vpci struct?
>>
>> It would avoid the overhead of the translation you do from pdev ->
>> vdev, and there doesn't seem to be anything relevant stored in
>> vpci_dev apart from the virtual sbdf.
> TL;DR It seems it might be needed from performance POV. If not implemented
> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
> Note: pcidevs' lock is a recursive lock
> 
> There are 2 sources of access to virtual devices:
> 1. During initialization when we add, assign or de-assign a PCI device
> 2. At run-time when we trap configuration space access and need to
> translate virtual SBDF into physical SBDF
> 3. At least de-assign can run concurrently with MMIO handlers
> 
> Now let's see which locks are in use while doing that.
> 
> 1. No struct vpci_dev is used.
> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you 
> suggest
> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
> 1.3. Locking happens on system level
> 
> 2. struct vpci_dev is used
> 2.1. We have a per-domain lock vdev_lock
> 2.2. Locking happens on per domain level
> 
> To compare the two:
> 
> 1. Without vpci_dev
> pros: much simpler code
> pros/cons: global lock is used during MMIO handling, but it is a recursive 
> lock

Could you point out to me in which way the recursive nature of the lock
is relevant here? Afaict that aspect is of no interest when considering
the performance effects of using a global lock vs one with more narrow
scope.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.