[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology


  • To: Oleksandr Andrushchenko <andr2000@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 30 Sep 2021 10:51:41 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=anyY2T1rcTg5NsX7B+m+J3daanHfu0TPvhigms+KrTQ=; b=VQkR5GgWRhGaWjgrCJEdNteY6eAPaeLwF5el4/zlIrcjw/KEuUtKVgo/c6VI/Sy84hG18+d2ZswuVNLi7AUjSKuC3bNMOk/7PclGtxu7Im9nLU9NeO7Tg6Yp2lrU/LbGRzWmBhPjygf851PzjusRmBqqgA4pX588GAG4XdJfTOArsiBTk/o9rjQjmvK7dczQZJkY8jzxC6WYsleegHEv5SOS+WzbGcQ45zQ6TcKBKgZ+4fDX0PpwT6EKKSlAKptT1Ss+XFplwHtElWG+Mi1O7AsvlkN79wtEL2x/3fXzQApg2D700zMj7eyw6r/jcsvJe4dFJYCQ62FmC0XlLPCUSQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U/BTrIay8XC1fXcRA+mFyrbuXu2wgGiDGkle4KOU9ZVQCE0xeJ2Fp+c2geCKsuNW4kny+XKa3flHOo+4A/ZmwA5v2NeSY1V9tyl+p0ZjRHdqwnfLvGdSy2vf25FlpCH79uq/77/i6OR1s2dShaJpd20GNaC5Vtyu5ImgUdJAATh+n7p8YSJdgNnRuxuRTOA1oLHTxSIS7/zfGa26l2REW6qewp3aU01wRdEwdU18utFm6FcioAGhL89a2LdmnkojVuWTrm4RB2J1hXQj623JmAxHWxkQwDhAE54A+p1KgimU6TPVj7Rzi7QShTzmXVQfzsLUqVybjJqk12NfkgDLzA==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: julien@xxxxxxx, sstabellini@xxxxxxxxxx, oleksandr_tyshchenko@xxxxxxxx, volodymyr_babchuk@xxxxxxxx, Artem_Mygaiev@xxxxxxxx, roger.pau@xxxxxxxxxx, bertrand.marquis@xxxxxxx, rahul.singh@xxxxxxx, Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 30 Sep 2021 08:51:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> 
> Assign SBDF to the PCI devices being passed through with bus 0.

This reads a little odd: If bus is already known (and I think you imply
segment to also be known), it's only DF which get assigned.

> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.

Or up to 256 when there are multi-function ones. Imo you at least want
to spell out how that case is intended to be handled (even if maybe
the code doesn't cover that case yet, in which case a respective code
comment would also want leaving).

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      return ret;
>  }
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT

May I ask why the code enclosed by this conditional has been put here
rather than under drivers/vpci/?

> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
> +                                                const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    list_for_each_entry ( vdev, &d->vdev_list, list )
> +        if ( vdev->pdev == pdev )
> +            return vdev;
> +    return NULL;
> +}

No locking here or ...

> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    ASSERT(!pci_find_virtual_device(d, pdev));

... in this first caller that I've managed to spot? See also below as
to up the call tree the pcidevs-lock being held (which at the very
least you would then want to ASSERT() for here).

> +    /* Each PCI bus supports 32 devices/slots at max. */
> +    if ( d->vpci_dev_next > 31 )
> +        return -ENOSPC;

Please avoid open-coding literals when they can be suitably expressed.

> +    vdev = xzalloc(struct vpci_dev);
> +    if ( !vdev )
> +        return -ENOMEM;
> +
> +    /* We emulate a single host bridge for the guest, so segment is always 
> 0. */
> +    vdev->seg = 0;
> +
> +    /*
> +     * The bus number is set to 0, so virtual devices are seen
> +     * as embedded endpoints behind the root complex.
> +     */
> +    vdev->bus = 0;

Strictly speaking both of these assignments are redundant with you
using xzalloc(). I'd prefer if there was just a comment, as the compiler
has no way recognizing this in order to eliminate these stores.

> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
> +
> +    vdev->pdev = pdev;
> +    vdev->domain = d;
> +
> +    pcidevs_lock();
> +    list_add_tail(&vdev->list, &d->vdev_list);
> +    pcidevs_unlock();

I don't support a global lock getting (ab)used for per-domain list
management.

Apart from that I don't understand why you acquire the lock here. Does
that mean the functions further were truly left without any locking,
by you not having noticed that this lock is already being held by the
sole caller?

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>  /* Notify vPCI that device is assigned to guest. */
>  int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to assign for hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> -    return vpci_bar_add_handlers(d, dev);
> +    rc = vpci_bar_add_handlers(d, dev);
> +    if ( rc )
> +        return rc;
> +
> +    return pci_add_virtual_device(d, dev);
>  }

I've peeked at the earlier patch, and both there and here I'm struggling to
see how undoing partially completed steps or fully completed earlier steps
is intended to work. I'm not convinced it is legitimate to leave handlers
in place until the tool stack manages to roll back the failed device
assignment.

>  /* Notify vPCI that device is de-assigned from guest. */
>  int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to de-assign from hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> +    rc = pci_remove_virtual_device(d, dev);
> +    if ( rc )
> +        return rc;
> +
>      return vpci_bar_remove_handlers(d, dev);
>  }

So what's the ultimate effect of a partially de-assigned device, where
one of the later steps failed? In a number of places we do best-effort
full cleanup, by recording errors but nevertheless continuing with
subsequent cleanup steps. I wonder whether that's a model to use here
as well.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -137,6 +137,24 @@ struct pci_dev {
>      struct vpci *vpci;
>  };
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +struct vpci_dev {
> +    struct list_head list;
> +    /* Physical PCI device this virtual device is connected to. */
> +    const struct pci_dev *pdev;
> +    /* Virtual SBDF of the device. */
> +    union {
> +        struct {
> +            uint8_t devfn;
> +            uint8_t bus;
> +            uint16_t seg;
> +        };
> +        pci_sbdf_t sbdf;

Could you explain to me why pci_sbdf_t (a typedef of a union) isn't
providing all you need? By putting it in a union with a custom
struct you set yourself up for things going out of sync if anyone
chose to alter pci_sbdf_t's layout.

> @@ -167,6 +185,10 @@ const unsigned long *pci_get_ro_map(u16 seg);
>  int pci_add_device(u16 seg, u8 bus, u8 devfn,
>                     const struct pci_dev_info *, nodeid_t node);
>  int pci_remove_device(u16 seg, u8 bus, u8 devfn);
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
> +#endif

Like for their definitions I question the placement of these
declarations.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -444,6 +444,14 @@ struct domain
>  
>  #ifdef CONFIG_HAS_PCI
>      struct list_head pdev_list;
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    struct list_head vdev_list;
> +    /*
> +     * Current device number used by the virtual PCI bus topology
> +     * to assign a unique SBDF to a passed through virtual PCI device.
> +     */
> +    int vpci_dev_next;

In how far can the number stored here be negative? If it can't be,
please use unsigned int.

As to the comment - "current" is ambiguous: Is it the number that
was used last, or the next one to be used?

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.