[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [PATCH 3 of 4] libxl: Add support for passing in the machine's E820 for PCI passthrough in libxl_device_pci_parse_bdf



On Tue, 2011-04-12 at 18:53 +0100, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 12, 2011 at 01:19:43PM +0100, Ian Campbell wrote:
> > On Mon, 2011-04-11 at 22:35 +0100, Konrad Rzeszutek Wilk wrote:
> > > 
> > > 
> > > The code libxl_device_pci_parse_bdf calls the libxl__e820_alloc when
> > > it has parsed one BDF. Subsequent calls to libxl_device_pci_parse_bdf
> > > will
> > > not trigger libxl__e820_alloc being called (unless the first call to
> > > libxl__e820_alloc failed).
> > 
> > That sounds like a very odd non-intuitive location for that allocation.
> > Why not do it in libxl_domain_create or somewhere like that?
> > 
> > I think the e820 map added to the idl should become a simple boolean
> > flag and this should all be taken care of internally based on that.
> 
> Like this?

Why no_machine_e820 instead of just machine_e820? You can set a non-zero
default in the appropriate libxl_foo_init function if need be.

Ian.

> # HG changeset patch
> # User Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> # Date 1302617081 14400
> # Node ID 945a34bcc2bd4bb341296f3ce84be6c42f24a4c9
> # Parent  4527044d3d1cebdc393e596072bc02d947a8de51
> libxl: Add support for passing in the machine's E820 for PCI passthrough
> 
> The code that populates E820 is unconditionally triggered by the guest
> configuration having "pci=['<BDF>,..']", being a PV guest, and if
> b_info->u.pv.no_machine_e820 is not set.
> 
> The code do_domain_create calls the libxl__e820_alloc when
> it notices that the guest is PV, has at least one PCI devices, and does not
> have the no_machine_e820 flag set..
> 
> libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the systems
> E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as
> well remove any E820_RAM or E820_UNUSED regions as the guest does not need to
> know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED 
> to
> get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes 
> that any
> gap in the E820 is considered PCI I/O space which means that if we pass
> in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
> gap between 2GB and 3GB will be considered as PCI I/O space. To guard against
> that we also create an E820_UNUSABLE between the region of 'target_kb'
> (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region.
> Lastly, the xc_domain_set_memory_map is called to install the new E820.
> 
> When tested with another PV guest (NetBSD 5.1) the modified E820 gave
> it no trouble. The code has also been tested with older "classic" Xen Linux
> and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid,
> Debian Squeeze, 2.6.37, 2.6.38, 2.6.39).
> 
> Memory that is slack or for balloon (so 'maxmem' in guest configuration)
> is put behind the machine E820. Which in most cases is after the 4GB.
> 
> The reason for doing the fetching of the E820 using the hypercall in
> the toolstack (instead of the guest doing it) is that when a guest
> would do a hypercall to 'XENMEM_machine_memory_map' it would
> retrieve an E820 with I/O range caps added in. Meaning that the
> region after 4GB up to end of possible memory would be marked as unusable
> and the kernel would not have any space to allocate a balloon
> region.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> 
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl.idl
> --- a/tools/libxl/libxl.idl   Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/libxl.idl   Tue Apr 12 10:04:41 2011 -0400
> @@ -117,6 +117,7 @@ libxl_domain_build_info = Struct("domain
>                                          ("cmdline", string),
>                                          ("ramdisk", libxl_file_reference),
>                                          ("features", string, True),
> +                                        ("no_machine_e820", bool),
>                                          ])),
>                   ])),
>      ],
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_create.c
> --- a/tools/libxl/libxl_create.c      Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/libxl_create.c      Tue Apr 12 10:04:41 2011 -0400
> @@ -536,6 +536,14 @@ static int do_domain_create(libxl__gc *g
>      for (i = 0; i < d_config->num_pcidevs; i++)
>          libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
>  
> +    if (!d_config->c_info.hvm && !d_config->b_info.u.pv.no_machine_e820) {
> +        int rc = 0;
> +        rc = libxl__e820_alloc(ctx, domid, d_config);
> +        if (rc)
> +            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> +                      "Failed while collecting E820 with: %d (errno:%d)\n",
> +                      rc, errno);
> +    }
>      if ( cb && (d_config->c_info.hvm || d_config->b_info.u.pv.bootloader )) {
>          if ( (*cb)(ctx, domid, priv) )
>              goto error_out;
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_internal.h
> --- a/tools/libxl/libxl_internal.h    Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/libxl_internal.h    Tue Apr 12 10:04:41 2011 -0400
> @@ -335,4 +335,5 @@ _hidden int libxl__error_set(libxl__gc *
>  _hidden int libxl__file_reference_map(libxl_file_reference *f);
>  _hidden int libxl__file_reference_unmap(libxl_file_reference *f);
>  
> +_hidden int libxl__e820_alloc(libxl_ctx *ctx, uint32_t domid, 
> libxl_domain_config *d_config);
>  #endif
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_pci.c
> --- a/tools/libxl/libxl_pci.c Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/libxl_pci.c Tue Apr 12 10:04:41 2011 -0400
> @@ -1047,3 +1047,157 @@ int libxl_device_pci_shutdown(libxl_ctx 
>      free(pcidevs);
>      return 0;
>  }
> +
> +static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
> +                         uint32_t *nr_entries,
> +                         unsigned long map_limitkb,
> +                         unsigned long balloon_kb)
> +{
> +    uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end;
> +    uint32_t i, idx = 0, nr;
> +    struct e820entry e820[E820MAX];
> +
> +    if (!src || !map_limitkb || !balloon_kb || !nr_entries)
> +        return ERROR_INVAL;
> +
> +    nr = *nr_entries;
> +    if (!nr)
> +        return ERROR_INVAL;
> +
> +    if (nr > E820MAX)
> +        return ERROR_NOMEM;
> +
> +    /* Weed out anything under 16MB */
> +    for (i = 0; i < nr; i++) {
> +      if (src[i].addr > 0x100000)
> +        continue;
> +
> +      src[i].type = 0;
> +      src[i].size = 0;
> +      src[i].addr = -1ULL;
> +    }
> +
> +    /* Find the lowest and highest entry in E820, skipping over
> +     * undersired entries. */
> +    start = -1ULL;
> +    last = 0;
> +    for (i = 0; i < nr; i++) {
> +        if ((src[i].type == E820_RAM) ||
> +            (src[i].type == E820_UNUSABLE) ||
> +            (src[i].type == 0)) 
> +             continue;
> +
> +            start = src[i].addr < start ? src[i].addr : start;
> +            last = src[i].addr + src[i].size > last ?
> +                    src[i].addr + src[i].size > last : last;
> +    }
> +    if (start > 1024)
> +      start_kb = start >> 10;
> +
> +    /* Add the memory RAM region for the guest */
> +    e820[idx].addr = 0;
> +    e820[idx].size = (uint64_t)map_limitkb << 10;
> +    e820[idx].type = E820_RAM;
> +
> +    /* .. and trim if neccessary */
> +    if (start_kb && map_limitkb > start_kb) {
> +        delta_kb = map_limitkb - start_kb;
> +        if (delta_kb)
> +          e820[idx].size -= (uint64_t)(delta_kb << 10);
> +    }
> +    /* Note: We don't touch balloon_kb here. Will add it at the end. */
> +    ram_end = e820[idx].addr + e820[idx].size;
> +    idx ++;
> +
> +    LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Memory: %ldkB End of RAM: 0x%lx (PFN) 
> " \
> +               "Delta: %ldkB, PCI start: %ldkB (0x%lx PFN), Balloon %ldkB\n",
> +             map_limitkb, ram_end >> 12, delta_kb, start_kb ,start >> 12,
> +                balloon_kb);
> +
> +    /* Check if there is a region between ram_end and start. */
> +    if (start > ram_end) {
> +      /* .. and if not present, add it in. This is to guard against
> +       the Linux guest assuming that the gap between the end of
> +       RAM region and the start of the E820_[ACPI,NVS,RESERVED]
> +       is PCI I/O space. Which it certainly is _not_. */
> +      e820[idx].type = E820_UNUSABLE;
> +      e820[idx].addr = ram_end;
> +      e820[idx].size = start - ram_end;
> +      idx++;
> +    } 
> +     /* Almost done: copy them over, ignoring the undesireable ones */
> +     for (i = 0; i < nr; i++) {
> +            if ((src[i].type == E820_RAM) ||
> +             (src[i].type == E820_UNUSABLE) ||
> +              (src[i].type == 0))
> +             continue;
> +            e820[idx].type = src[i].type;
> +            e820[idx].addr = src[i].addr;
> +            e820[idx].size = src[i].size;
> +            idx++;
> +     }
> +
> +     /* At this point we have the mapped RAM + E820 entries from src. */
> +     if (balloon_kb) {
> +        /* and if we truncated the RAM region, then add it to the end. */
> +        e820[idx].type = E820_RAM;
> +        e820[idx].addr = (uint64_t)(1ULL << 32) > last ? (uint64_t)(1ULL << 
> 32) : last;
> +        /* also add the balloon memory to the end. */
> +        e820[idx].size = (uint64_t)(delta_kb << 10) + (uint64_t)(balloon_kb 
> << 10);
> +        idx++;
> +
> +    }
> +    nr = idx;
> +
> +    for (i = 0; i < nr; i++) {
> +      LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, ":%s\t[%lx -> %lx]",
> +     e820[i].type == E820_RAM ? "RAM " :
> +     (e820[i].type == E820_RESERVED ? "RSV " :
> +      e820[i].type == E820_ACPI ? "ACPI" :
> +         (e820[i].type == E820_NVS ? "NVS " :
> +          (e820[i].type == E820_UNUSABLE ? "UNU " : "----"))),
> +          e820[i].addr >> 12,
> +         (e820[i].addr + e820[i].size) >> 12);
> +    }
> +
> +    /* Done: copy the sanitized version. */
> +    *nr_entries = nr;
> +    memcpy(src, e820, nr * sizeof(struct e820entry));
> +    return 0;
> +}
> +      
> +
> +int libxl__e820_alloc(libxl_ctx *ctx, uint32_t domid, libxl_domain_config 
> *d_config)
> +{
> +    int rc;
> +    uint32_t nr;
> +    struct e820entry map[E820MAX];
> +    libxl_domain_build_info *b_info;
> +
> +    if (d_config == NULL || d_config->c_info.hvm)
> +      return ERROR_INVAL;
> +
> +    b_info = &d_config->b_info;
> +    if (b_info->u.pv.no_machine_e820)
> +      return ERROR_INVAL;
> +
> +    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
> +    if (rc < 0) {
> +        errno = rc;
> +        return ERROR_FAIL;
> +    }
> +    nr = rc;
> +    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
> +                       (b_info->max_memkb - b_info->target_memkb) +
> +                       b_info->u.pv.slack_memkb);
> +    if (rc)
> +        return ERROR_FAIL;
> +
> +    rc = xc_domain_set_memory_map(ctx->xch, domid, map, nr);
> +
> +    if (rc < 0) {
> +        errno  = rc;
> +        return ERROR_FAIL;
> +    }
> +    return 0;
> +}
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c        Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/xl_cmdimpl.c        Tue Apr 12 10:04:41 2011 -0400
> @@ -371,6 +371,7 @@ static void printf_info(int domid,
>          printf("\t\t\t(kernel %s)\n", b_info->kernel.path);
>          printf("\t\t\t(cmdline %s)\n", b_info->u.pv.cmdline);
>          printf("\t\t\t(ramdisk %s)\n", b_info->u.pv.ramdisk.path);
> +        printf("\t\t\t(no_machine_e820 %d)\n", b_info->u.pv.no_machine_e820);
>          printf("\t\t)\n");
>      }
>      printf("\t)\n");
> @@ -1025,7 +1026,9 @@ skip_vfb:
>              if (!libxl_device_pci_parse_bdf(&ctx, pcidev, buf))
>                  d_config->num_pcidevs++;
>          }
> -    }
> +    } else
> +      if (!c_info->hvm)
> +        b_info->u.pv.no_machine_e820 = 1;
>  
>      switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) {
>      case 0:



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.