WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH 6 of 6] xl: implement multifunction device PCI pa

On Wed, 2010-08-04 at 15:52 +0100, Stefano Stabellini wrote:
> On Mon, 2 Aug 2010, Gianni Tedesco (3P) wrote:
> >  tools/libxl/libxl.h     |    1 +
> >  tools/libxl/libxl_pci.c |  118 
> > ++++++++++++++++++++++++++++++++++++++++++++---
> >  2 files changed, 111 insertions(+), 8 deletions(-)
> > 
> > 
> > # HG changeset patch
> > # User Gianni Tedesco <gianni.tedesco@xxxxxxxxxx>
> > # Date 1280760698 -3600
> > # Branch pci-patches-v3
> > # Node ID 6379343447e58cc7b83e27beba9c35e62c232d9d
> > # Parent  eb8ee481bc063b18b39feaa76d9b757331ed3a9f
> > xl: implement multifunction device PCI pass-through
> > 
> > Implement PCI pass-through for multi-function devices. The supported BDF
> > notation is: BB:DD.* - therefore passing-through a subset of functions or
> > remapping the function numbers is not supported. Largely because I don't see
> > how that can ever be safe or sane (perhaps for SR-IOV cards?)
> > 
> > Letting qemu automatically select the virtual slot is not supported since I
> > don't believe that qemu-dm can actually handle that case.
> > 
> > Signed-off-by: Gianni Tedesco <gianni.tedesco@xxxxxxxxxx>
> > 
> > diff -r eb8ee481bc06 -r 6379343447e5 tools/libxl/libxl.h
> > --- a/tools/libxl/libxl.h   Mon Aug 02 15:47:38 2010 +0100
> > +++ b/tools/libxl/libxl.h   Mon Aug 02 15:51:38 2010 +0100
> > @@ -308,6 +308,7 @@ typedef struct {
> >      unsigned int vdevfn;
> >      bool msitranslate;
> >      bool power_mgmt;
> > +    bool multifunction;
> >  } libxl_device_pci;
> >  
> >  enum {
> > diff -r eb8ee481bc06 -r 6379343447e5 tools/libxl/libxl_pci.c
> > --- a/tools/libxl/libxl_pci.c       Mon Aug 02 15:47:38 2010 +0100
> > +++ b/tools/libxl/libxl_pci.c       Mon Aug 02 15:51:38 2010 +0100
> > @@ -136,8 +136,13 @@ int libxl_device_pci_parse_bdf(libxl_ctx
> >                      break;
> >                  }
> >                  *ptr = '\0';
> > -                if ( hex_convert(tok, &func, 0x7) )
> > -                    goto parse_error;
> > +                if ( !strcmp(tok, "*") ) {
> > +                    pcidev->multifunction = 1;
> > +                }else{
> > +                    if ( hex_convert(tok, &func, 0x7) )
> > +                        goto parse_error;
> > +                    pcidev->multifunction = 0;
> > +                }
> >                  tok = ptr + 1;
> >              }
> >              break;
> 
> Couldn't you add the func_mask somewhere in libxl_device_pci, instead of
> calling pci_multifunction_check multilple times?
> Would be possible to make the normal passthrough case just a subset of
> the general multifunction passthrough case to simplify the code?

Not sure what you mean by this, the parsing parts don't (and oughtn't)
frob about with sysfs nodes etc. The pci_multifunction_check function is
called once per device add and once per device remove iff the parser had
seen XX:YY.* notation indicating multifunction devices.

> 
> > @@ -531,6 +536,50 @@ int libxl_device_pci_list_assignable(lib
> >      return 0;
> >  }
> >  
> > +static int pci_multifunction_check(libxl_ctx *ctx, libxl_device_pci 
> > *pcidev, unsigned int *func_mask)
> > +{
> 
> a comment on top of this function to explain what it is actually checking 
> would be nice 

This function checks that all functions of a device are bound to pciback
driver. It also initialises a bit-mask of which function numbers are
present on that device.

I shall add the comment in next rev.

> > +    struct dirent *de;
> > +    DIR *dir;
> > +
> > +    *func_mask = 0;
> > +
> > +    dir = opendir(SYSFS_PCI_DEV);
> > +    if ( NULL == dir ) {
> > +        XL_LOG_ERRNO(ctx, XL_LOG_ERROR, "Couldn't open %s", SYSFS_PCI_DEV);
> > +        return -1;
> > +    }
> > +
> > +    while( (de = readdir(dir)) ) {
> > +        unsigned dom, bus, dev, func;
> > +        struct stat st;
> > +        char *path;
> > +
> > +        if ( sscanf(de->d_name, PCI_BDF, &dom, &bus, &dev, &func) != 4 )
> > +            continue;
> > +        if ( pcidev->domain != dom )
> > +            continue;
> > +        if ( pcidev->bus != bus )
> > +            continue;
> > +        if ( pcidev->dev != dev )
> > +            continue;
> > +
> > +        path = libxl_sprintf(ctx, "%s/" PCI_BDF, SYSFS_PCIBACK_DRIVER, 
> > dom, bus, dev, func);
> > +        if ( lstat(path, &st) ) {
> > +            if ( errno == ENOENT )
> > +                XL_LOG(ctx, XL_LOG_ERROR, PCI_BDF " is not assigned to 
> > pciback driver",
> > +                       dom, bus, dev, func);
> > +            else
> > +                XL_LOG_ERRNO(ctx, XL_LOG_ERROR, "Couldn't lstat %s", path);
> > +            closedir(dir);
> > +            return -1;
> > +        }
> > +        (*func_mask) |= (1 << func);
> > +    }
> > +
> > +    closedir(dir);
> > +    return 0;
> > +}
> > +
> >  static int pci_ins_check(libxl_ctx *ctx, uint32_t domid, const char 
> > *state, void *priv)
> >  {
> >      char *orig_state = priv;
> > @@ -679,10 +728,32 @@ int libxl_device_pci_add(libxl_ctx *ctx,
> >              return rc;
> >      }
> >  
> > +    if ( pcidev->multifunction ) {
> > +        unsigned int i, orig_vdev, func_mask, rc = 0;
> > +        orig_vdev = pcidev->vdevfn & ~7U;
> > +
> > +        if ( pci_multifunction_check(ctx, pcidev, &func_mask) )
> > +            return ERROR_FAIL;
> > +        if ( !(pcidev->vdevfn >> 3) ) {
> > +            XL_LOG(ctx, XL_LOG_ERROR, "Must specify a v-slot for 
> > multi-function devices");
> > +            return ERROR_FAIL;
> > +        }
> > +
> > +        for(i = 7; i < 8; --i) {
> 
> shouldn't this be for(i = 7; i >= 0; --i)?

That will generate a compiler warning since unsigned int's are always >=
0. The code relies on an underflow condition. However, I could change it
to int if you think that makes it clearer.

> > +            if ( (1 << i) & func_mask ) {
> > +                pcidev->func = i;
> > +                pcidev->vdevfn = orig_vdev | i;
> > +                if ( do_pci_add(ctx, domid, pcidev) )
> > +                    rc = ERROR_FAIL;
> > +            }
> > +        }
> > +        return rc;
> > +    }
> > +
> >      return do_pci_add(ctx, domid, pcidev);
> >  }
> >  
> > -int libxl_device_pci_remove(libxl_ctx *ctx, uint32_t domid, 
> > libxl_device_pci *pcidev)
> > +static int do_pci_remove(libxl_ctx *ctx, uint32_t domid, libxl_device_pci 
> > *pcidev)
> >  {
> >      libxl_device_pci *assigned;
> >      char *path;
> > @@ -711,10 +782,15 @@ int libxl_device_pci_remove(libxl_ctx *c
> >          libxl_xs_write(ctx, XBT_NULL, path, PCI_BDF, pcidev->domain,
> >                         pcidev->bus, pcidev->dev, pcidev->func);
> >          path = libxl_sprintf(ctx, 
> > "/local/domain/0/device-model/%d/command", domid);
> > -        xs_write(ctx->xsh, XBT_NULL, path, "pci-rem", strlen("pci-rem"));
> > -        if (libxl_wait_for_device_model(ctx, domid, "pci-removed", NULL, 
> > NULL) < 0) {
> > -            XL_LOG(ctx, XL_LOG_ERROR, "Device Model didn't respond in 
> > time");
> > -            return ERROR_FAIL;
> > +
> > +        /* Remove all functions at once atomically by only signalling
> > +         * device-model for function 0 */
> > +        if ( !pcidev->multifunction || pcidev->func == 0 ) {
> > +            xs_write(ctx->xsh, XBT_NULL, path, "pci-rem", 
> > strlen("pci-rem"));
> > +            if (libxl_wait_for_device_model(ctx, domid, "pci-removed", 
> > NULL, NULL) < 0) {
> > +                XL_LOG(ctx, XL_LOG_ERROR, "Device Model didn't respond in 
> > time");
> > +                return ERROR_FAIL;
> > +            }
> >          }
> >          path = libxl_sprintf(ctx, "/local/domain/0/device-model/%d/state", 
> > domid);
> >          xs_write(ctx->xsh, XBT_NULL, path, state, strlen(state));
> > @@ -769,7 +845,10 @@ skip1:
> >          fclose(f);
> >      }
> >  out:
> > -    libxl_device_pci_reset(ctx, pcidev->domain, pcidev->bus, pcidev->dev, 
> > pcidev->func);
> > +    /* don't do multiple resets while some functions are still passed 
> > through */
> > +    if ( !pcidev->multifunction || pcidev->func == 0 ) {
> > +        libxl_device_pci_reset(ctx, pcidev->domain, pcidev->bus, 
> > pcidev->dev, pcidev->func);
> > +    }
> >  
> >      if (!libxl_is_stubdom(ctx, domid, NULL)) {
> >          rc = xc_deassign_device(ctx->xch, domid, pcidev->value);
> > @@ -786,6 +865,29 @@ out:
> >      return 0;
> >  }
> >  
> > +int libxl_device_pci_remove(libxl_ctx *ctx, uint32_t domid, 
> > libxl_device_pci *pcidev)
> > +{
> > +    if ( pcidev->multifunction ) {
> > +        unsigned int i, orig_vdev, func_mask, rc;
> > +        orig_vdev = pcidev->vdevfn & ~7U;
> > +
> > +        if ( pci_multifunction_check(ctx, pcidev, &func_mask) )
> > +            return ERROR_FAIL;
> > +
> > +        for(i = 7; i < 8; --i) {
> 
> same here
> 
> > +            if ( (1 << i) & func_mask ) {
> > +                pcidev->func = i;
> > +                pcidev->vdevfn = orig_vdev | i;
> > +                if ( do_pci_remove(ctx, domid, pcidev) )
> > +                    rc = ERROR_FAIL;
> > +            }
> > +        }
> > +        return rc;
> > +    }
> > +
> > +    return do_pci_remove(ctx, domid, pcidev);
> > +}
> > +
> >  int libxl_device_pci_list_assigned(libxl_ctx *ctx, libxl_device_pci 
> > **list, uint32_t domid, int *num)
> >  {
> >      char *be_path, *num_devs, *xsdev, *xsvdevfn, *xsopts;
> > 
> 
> 
> applied all the other patches in the series

OK thanks.

I will also need to handle cleanup in the error path of multifunction
device addition as discussed. This can actually happen quite easily if
function 0 alone is assigned to another domain because
pci_multifunction_check() doesn't look in xenstore and also because of
missing safety checks. Specifically, when adding a non-multifunction
device we need a check to make sure that the physical device really has
only one function.

I am thinking that we probably ought to scrap the XX:YY.* notation and
enforce that function is ALWAYS set to 0. Then when adding/removing the
device we should check for additional functions and handle them
transparently to the user. As I said in the commit message I am unaware
of any valid uses for separating functions but I need to look at how
SR-IOV works.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel