[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH 6 of 6] xl: implement multifunction device PCI pass-through



On Wed, 2010-08-04 at 16:47 +0100, Stefano Stabellini wrote:
> On Wed, 4 Aug 2010, Gianni Tedesco (3P) wrote:
> > On Wed, 2010-08-04 at 15:52 +0100, Stefano Stabellini wrote:
> > > On Mon, 2 Aug 2010, Gianni Tedesco (3P) wrote:
> > > >  tools/libxl/libxl.h     |    1 +
> > > >  tools/libxl/libxl_pci.c |  118 
> > > > ++++++++++++++++++++++++++++++++++++++++++++---
> > > >  2 files changed, 111 insertions(+), 8 deletions(-)
> > > > 
> > > > 
> > > > # HG changeset patch
> > > > # User Gianni Tedesco <gianni.tedesco@xxxxxxxxxx>
> > > > # Date 1280760698 -3600
> > > > # Branch pci-patches-v3
> > > > # Node ID 6379343447e58cc7b83e27beba9c35e62c232d9d
> > > > # Parent  eb8ee481bc063b18b39feaa76d9b757331ed3a9f
> > > > xl: implement multifunction device PCI pass-through
> > > > 
> > > > Implement PCI pass-through for multi-function devices. The supported BDF
> > > > notation is: BB:DD.* - therefore passing-through a subset of functions 
> > > > or
> > > > remapping the function numbers is not supported. Largely because I 
> > > > don't see
> > > > how that can ever be safe or sane (perhaps for SR-IOV cards?)
> > > > 
> > > > Letting qemu automatically select the virtual slot is not supported 
> > > > since I
> > > > don't believe that qemu-dm can actually handle that case.
> > > > 
> > > > Signed-off-by: Gianni Tedesco <gianni.tedesco@xxxxxxxxxx>
> > > > 
> > > > diff -r eb8ee481bc06 -r 6379343447e5 tools/libxl/libxl.h
> > > > --- a/tools/libxl/libxl.h       Mon Aug 02 15:47:38 2010 +0100
> > > > +++ b/tools/libxl/libxl.h       Mon Aug 02 15:51:38 2010 +0100
> > > > @@ -308,6 +308,7 @@ typedef struct {
> > > >      unsigned int vdevfn;
> > > >      bool msitranslate;
> > > >      bool power_mgmt;
> > > > +    bool multifunction;
> > > >  } libxl_device_pci;
> > > >  
> > > >  enum {
> > > > diff -r eb8ee481bc06 -r 6379343447e5 tools/libxl/libxl_pci.c
> > > > --- a/tools/libxl/libxl_pci.c   Mon Aug 02 15:47:38 2010 +0100
> > > > +++ b/tools/libxl/libxl_pci.c   Mon Aug 02 15:51:38 2010 +0100
> > > > @@ -136,8 +136,13 @@ int libxl_device_pci_parse_bdf(libxl_ctx
> > > >                      break;
> > > >                  }
> > > >                  *ptr = '\0';
> > > > -                if ( hex_convert(tok, &func, 0x7) )
> > > > -                    goto parse_error;
> > > > +                if ( !strcmp(tok, "*") ) {
> > > > +                    pcidev->multifunction = 1;
> > > > +                }else{
> > > > +                    if ( hex_convert(tok, &func, 0x7) )
> > > > +                        goto parse_error;
> > > > +                    pcidev->multifunction = 0;
> > > > +                }
> > > >                  tok = ptr + 1;
> > > >              }
> > > >              break;
> > > 
> > > Couldn't you add the func_mask somewhere in libxl_device_pci, instead of
> > > calling pci_multifunction_check multilple times?
> > > Would be possible to make the normal passthrough case just a subset of
> > > the general multifunction passthrough case to simplify the code?
> > 
> > Not sure what you mean by this, the parsing parts don't (and oughtn't)
> > frob about with sysfs nodes etc. The pci_multifunction_check function is
> > called once per device add and once per device remove iff the parser had
> > seen XX:YY.* notation indicating multifunction devices.
> > 
> 
> I mean we could have a mask instead of an interger in libxl_device_pci
> to specify which functions should be assigned and in the single function
> case the mask will have only one bit set.
> At this point single function passthrough become just a case of
> multifunction passthrough.
> I am not entirely sure that this approach would produce better code
> though.

I see what you mean. I could set it to #define PCI_FUNCTION_EVERYTHING
~0 at parse-time then in multifunction_check unset all the non-existent
functions.

> 
> > > 
> > > > @@ -531,6 +536,50 @@ int libxl_device_pci_list_assignable(lib
> > > >      return 0;
> > > >  }
> > > >  
> > > > +static int pci_multifunction_check(libxl_ctx *ctx, libxl_device_pci 
> > > > *pcidev, unsigned int *func_mask)
> > > > +{
> > > 
> > > a comment on top of this function to explain what it is actually checking 
> > > would be nice 
> > 
> > This function checks that all functions of a device are bound to pciback
> > driver. It also initialises a bit-mask of which function numbers are
> > present on that device.
> > 
> > I shall add the comment in next rev.
> > 
> > > > +    struct dirent *de;
> > > > +    DIR *dir;
> > > > +
> > > > +    *func_mask = 0;
> > > > +
> > > > +    dir = opendir(SYSFS_PCI_DEV);
> > > > +    if ( NULL == dir ) {
> > > > +        XL_LOG_ERRNO(ctx, XL_LOG_ERROR, "Couldn't open %s", 
> > > > SYSFS_PCI_DEV);
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    while( (de = readdir(dir)) ) {
> > > > +        unsigned dom, bus, dev, func;
> > > > +        struct stat st;
> > > > +        char *path;
> > > > +
> > > > +        if ( sscanf(de->d_name, PCI_BDF, &dom, &bus, &dev, &func) != 4 
> > > > )
> > > > +            continue;
> > > > +        if ( pcidev->domain != dom )
> > > > +            continue;
> > > > +        if ( pcidev->bus != bus )
> > > > +            continue;
> > > > +        if ( pcidev->dev != dev )
> > > > +            continue;
> > > > +
> > > > +        path = libxl_sprintf(ctx, "%s/" PCI_BDF, SYSFS_PCIBACK_DRIVER, 
> > > > dom, bus, dev, func);
> > > > +        if ( lstat(path, &st) ) {
> > > > +            if ( errno == ENOENT )
> > > > +                XL_LOG(ctx, XL_LOG_ERROR, PCI_BDF " is not assigned to 
> > > > pciback driver",
> > > > +                       dom, bus, dev, func);
> > > > +            else
> > > > +                XL_LOG_ERRNO(ctx, XL_LOG_ERROR, "Couldn't lstat %s", 
> > > > path);
> > > > +            closedir(dir);
> > > > +            return -1;
> > > > +        }
> > > > +        (*func_mask) |= (1 << func);
> > > > +    }
> > > > +
> > > > +    closedir(dir);
> > > > +    return 0;
> > > > +}
> > > > +
> > > >  static int pci_ins_check(libxl_ctx *ctx, uint32_t domid, const char 
> > > > *state, void *priv)
> > > >  {
> > > >      char *orig_state = priv;
> > > > @@ -679,10 +728,32 @@ int libxl_device_pci_add(libxl_ctx *ctx,
> > > >              return rc;
> > > >      }
> > > >  
> > > > +    if ( pcidev->multifunction ) {
> > > > +        unsigned int i, orig_vdev, func_mask, rc = 0;
> > > > +        orig_vdev = pcidev->vdevfn & ~7U;
> > > > +
> > > > +        if ( pci_multifunction_check(ctx, pcidev, &func_mask) )
> > > > +            return ERROR_FAIL;
> > > > +        if ( !(pcidev->vdevfn >> 3) ) {
> > > > +            XL_LOG(ctx, XL_LOG_ERROR, "Must specify a v-slot for 
> > > > multi-function devices");
> > > > +            return ERROR_FAIL;
> > > > +        }
> > > > +
> > > > +        for(i = 7; i < 8; --i) {
> > > 
> > > shouldn't this be for(i = 7; i >= 0; --i)?
> > 
> > That will generate a compiler warning since unsigned int's are always >=
> > 0. The code relies on an underflow condition. However, I could change it
> > to int if you think that makes it clearer.
> > 
> 
> yeah please do that

OK

> > > > +            if ( (1 << i) & func_mask ) {
> > > > +                pcidev->func = i;
> > > > +                pcidev->vdevfn = orig_vdev | i;
> > > > +                if ( do_pci_add(ctx, domid, pcidev) )
> > > > +                    rc = ERROR_FAIL;
> > > > +            }
> > > > +        }
> > > > +        return rc;
> > > > +    }
> > > > +
> > > >      return do_pci_add(ctx, domid, pcidev);
> > > >  }
> > > >  
> > > > -int libxl_device_pci_remove(libxl_ctx *ctx, uint32_t domid, 
> > > > libxl_device_pci *pcidev)
> > > > +static int do_pci_remove(libxl_ctx *ctx, uint32_t domid, 
> > > > libxl_device_pci *pcidev)
> > > >  {
> > > >      libxl_device_pci *assigned;
> > > >      char *path;
> > > > @@ -711,10 +782,15 @@ int libxl_device_pci_remove(libxl_ctx *c
> > > >          libxl_xs_write(ctx, XBT_NULL, path, PCI_BDF, pcidev->domain,
> > > >                         pcidev->bus, pcidev->dev, pcidev->func);
> > > >          path = libxl_sprintf(ctx, 
> > > > "/local/domain/0/device-model/%d/command", domid);
> > > > -        xs_write(ctx->xsh, XBT_NULL, path, "pci-rem", 
> > > > strlen("pci-rem"));
> > > > -        if (libxl_wait_for_device_model(ctx, domid, "pci-removed", 
> > > > NULL, NULL) < 0) {
> > > > -            XL_LOG(ctx, XL_LOG_ERROR, "Device Model didn't respond in 
> > > > time");
> > > > -            return ERROR_FAIL;
> > > > +
> > > > +        /* Remove all functions at once atomically by only signalling
> > > > +         * device-model for function 0 */
> > > > +        if ( !pcidev->multifunction || pcidev->func == 0 ) {
> > > > +            xs_write(ctx->xsh, XBT_NULL, path, "pci-rem", 
> > > > strlen("pci-rem"));
> > > > +            if (libxl_wait_for_device_model(ctx, domid, "pci-removed", 
> > > > NULL, NULL) < 0) {
> > > > +                XL_LOG(ctx, XL_LOG_ERROR, "Device Model didn't respond 
> > > > in time");
> > > > +                return ERROR_FAIL;
> > > > +            }
> > > >          }
> > > >          path = libxl_sprintf(ctx, 
> > > > "/local/domain/0/device-model/%d/state", domid);
> > > >          xs_write(ctx->xsh, XBT_NULL, path, state, strlen(state));
> > > > @@ -769,7 +845,10 @@ skip1:
> > > >          fclose(f);
> > > >      }
> > > >  out:
> > > > -    libxl_device_pci_reset(ctx, pcidev->domain, pcidev->bus, 
> > > > pcidev->dev, pcidev->func);
> > > > +    /* don't do multiple resets while some functions are still passed 
> > > > through */
> > > > +    if ( !pcidev->multifunction || pcidev->func == 0 ) {
> > > > +        libxl_device_pci_reset(ctx, pcidev->domain, pcidev->bus, 
> > > > pcidev->dev, pcidev->func);
> > > > +    }
> > > >  
> > > >      if (!libxl_is_stubdom(ctx, domid, NULL)) {
> > > >          rc = xc_deassign_device(ctx->xch, domid, pcidev->value);
> > > > @@ -786,6 +865,29 @@ out:
> > > >      return 0;
> > > >  }
> > > >  
> > > > +int libxl_device_pci_remove(libxl_ctx *ctx, uint32_t domid, 
> > > > libxl_device_pci *pcidev)
> > > > +{
> > > > +    if ( pcidev->multifunction ) {
> > > > +        unsigned int i, orig_vdev, func_mask, rc;
> > > > +        orig_vdev = pcidev->vdevfn & ~7U;
> > > > +
> > > > +        if ( pci_multifunction_check(ctx, pcidev, &func_mask) )
> > > > +            return ERROR_FAIL;
> > > > +
> > > > +        for(i = 7; i < 8; --i) {
> > > 
> > > same here
> > > 
> > > > +            if ( (1 << i) & func_mask ) {
> > > > +                pcidev->func = i;
> > > > +                pcidev->vdevfn = orig_vdev | i;
> > > > +                if ( do_pci_remove(ctx, domid, pcidev) )
> > > > +                    rc = ERROR_FAIL;
> > > > +            }
> > > > +        }
> > > > +        return rc;
> > > > +    }
> > > > +
> > > > +    return do_pci_remove(ctx, domid, pcidev);
> > > > +}
> > > > +
> > > >  int libxl_device_pci_list_assigned(libxl_ctx *ctx, libxl_device_pci 
> > > > **list, uint32_t domid, int *num)
> > > >  {
> > > >      char *be_path, *num_devs, *xsdev, *xsvdevfn, *xsopts;
> > > > 
> > > 
> > > 
> > > applied all the other patches in the series
> > 
> > OK thanks.
> > 
> > I will also need to handle cleanup in the error path of multifunction
> > device addition as discussed. This can actually happen quite easily if
> > function 0 alone is assigned to another domain because
> > pci_multifunction_check() doesn't look in xenstore and also because of
> > missing safety checks. Specifically, when adding a non-multifunction
> > device we need a check to make sure that the physical device really has
> > only one function.
> > 
> > I am thinking that we probably ought to scrap the XX:YY.* notation and
> > enforce that function is ALWAYS set to 0. Then when adding/removing the
> > device we should check for additional functions and handle them
> > transparently to the user. As I said in the commit message I am unaware
> > of any valid uses for separating functions but I need to look at how
> > SR-IOV works.
> > 
> > 
> 
> With SR-IOV devices you can easily have valid BDFs like 0000:05:1c.2, so
> enforcing that function is always 0 is not a good idea.

I will have to look in to that and try come up something that is safe in
all such cases. For now I will just re-work the patch along the lines
you mentioned.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.