WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: [PATCH 3 of 4] libxl: Add support for passing in the

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] Re: [PATCH 3 of 4] libxl: Add support for passing in the machine's E820 for PCI passthrough in libxl_device_pci_parse_bdf
From: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>
Date: Tue, 12 Apr 2011 19:44:48 +0100
Cc: Tim Deegan <Tim.Deegan@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, "keir.fraser@xxxxxxxxxxxxx" <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Tue, 12 Apr 2011 11:45:42 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110412175315.GA19066@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix Systems, Inc.
References: <patchbomb.1302557730@xxxxxxxxxxxxxxxxxxxxxxx> <75e24fb720fa9a1c529b.1302557733@xxxxxxxxxxxxxxxxxxxxxxx> <1302610783.27835.146.camel@xxxxxxxxxxxxxxxxxxxxxx> <20110412175315.GA19066@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Tue, 2011-04-12 at 18:53 +0100, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 12, 2011 at 01:19:43PM +0100, Ian Campbell wrote:
> > On Mon, 2011-04-11 at 22:35 +0100, Konrad Rzeszutek Wilk wrote:
> > > 
> > > 
> > > The code libxl_device_pci_parse_bdf calls the libxl__e820_alloc when
> > > it has parsed one BDF. Subsequent calls to libxl_device_pci_parse_bdf
> > > will
> > > not trigger libxl__e820_alloc being called (unless the first call to
> > > libxl__e820_alloc failed).
> > 
> > That sounds like a very odd non-intuitive location for that allocation.
> > Why not do it in libxl_domain_create or somewhere like that?
> > 
> > I think the e820 map added to the idl should become a simple boolean
> > flag and this should all be taken care of internally based on that.
> 
> Like this?

Why no_machine_e820 instead of just machine_e820? You can set a non-zero
default in the appropriate libxl_foo_init function if need be.

Ian.

> # HG changeset patch
> # User Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> # Date 1302617081 14400
> # Node ID 945a34bcc2bd4bb341296f3ce84be6c42f24a4c9
> # Parent  4527044d3d1cebdc393e596072bc02d947a8de51
> libxl: Add support for passing in the machine's E820 for PCI passthrough
> 
> The code that populates E820 is unconditionally triggered by the guest
> configuration having "pci=['<BDF>,..']", being a PV guest, and if
> b_info->u.pv.no_machine_e820 is not set.
> 
> The code do_domain_create calls the libxl__e820_alloc when
> it notices that the guest is PV, has at least one PCI devices, and does not
> have the no_machine_e820 flag set..
> 
> libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the systems
> E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as
> well remove any E820_RAM or E820_UNUSED regions as the guest does not need to
> know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED 
> to
> get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes 
> that any
> gap in the E820 is considered PCI I/O space which means that if we pass
> in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
> gap between 2GB and 3GB will be considered as PCI I/O space. To guard against
> that we also create an E820_UNUSABLE between the region of 'target_kb'
> (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region.
> Lastly, the xc_domain_set_memory_map is called to install the new E820.
> 
> When tested with another PV guest (NetBSD 5.1) the modified E820 gave
> it no trouble. The code has also been tested with older "classic" Xen Linux
> and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid,
> Debian Squeeze, 2.6.37, 2.6.38, 2.6.39).
> 
> Memory that is slack or for balloon (so 'maxmem' in guest configuration)
> is put behind the machine E820. Which in most cases is after the 4GB.
> 
> The reason for doing the fetching of the E820 using the hypercall in
> the toolstack (instead of the guest doing it) is that when a guest
> would do a hypercall to 'XENMEM_machine_memory_map' it would
> retrieve an E820 with I/O range caps added in. Meaning that the
> region after 4GB up to end of possible memory would be marked as unusable
> and the kernel would not have any space to allocate a balloon
> region.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> 
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl.idl
> --- a/tools/libxl/libxl.idl   Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/libxl.idl   Tue Apr 12 10:04:41 2011 -0400
> @@ -117,6 +117,7 @@ libxl_domain_build_info = Struct("domain
>                                          ("cmdline", string),
>                                          ("ramdisk", libxl_file_reference),
>                                          ("features", string, True),
> +                                        ("no_machine_e820", bool),
>                                          ])),
>                   ])),
>      ],
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_create.c
> --- a/tools/libxl/libxl_create.c      Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/libxl_create.c      Tue Apr 12 10:04:41 2011 -0400
> @@ -536,6 +536,14 @@ static int do_domain_create(libxl__gc *g
>      for (i = 0; i < d_config->num_pcidevs; i++)
>          libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
>  
> +    if (!d_config->c_info.hvm && !d_config->b_info.u.pv.no_machine_e820) {
> +        int rc = 0;
> +        rc = libxl__e820_alloc(ctx, domid, d_config);
> +        if (rc)
> +            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> +                      "Failed while collecting E820 with: %d (errno:%d)\n",
> +                      rc, errno);
> +    }
>      if ( cb && (d_config->c_info.hvm || d_config->b_info.u.pv.bootloader )) {
>          if ( (*cb)(ctx, domid, priv) )
>              goto error_out;
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_internal.h
> --- a/tools/libxl/libxl_internal.h    Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/libxl_internal.h    Tue Apr 12 10:04:41 2011 -0400
> @@ -335,4 +335,5 @@ _hidden int libxl__error_set(libxl__gc *
>  _hidden int libxl__file_reference_map(libxl_file_reference *f);
>  _hidden int libxl__file_reference_unmap(libxl_file_reference *f);
>  
> +_hidden int libxl__e820_alloc(libxl_ctx *ctx, uint32_t domid, 
> libxl_domain_config *d_config);
>  #endif
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/libxl_pci.c
> --- a/tools/libxl/libxl_pci.c Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/libxl_pci.c Tue Apr 12 10:04:41 2011 -0400
> @@ -1047,3 +1047,157 @@ int libxl_device_pci_shutdown(libxl_ctx 
>      free(pcidevs);
>      return 0;
>  }
> +
> +static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
> +                         uint32_t *nr_entries,
> +                         unsigned long map_limitkb,
> +                         unsigned long balloon_kb)
> +{
> +    uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end;
> +    uint32_t i, idx = 0, nr;
> +    struct e820entry e820[E820MAX];
> +
> +    if (!src || !map_limitkb || !balloon_kb || !nr_entries)
> +        return ERROR_INVAL;
> +
> +    nr = *nr_entries;
> +    if (!nr)
> +        return ERROR_INVAL;
> +
> +    if (nr > E820MAX)
> +        return ERROR_NOMEM;
> +
> +    /* Weed out anything under 16MB */
> +    for (i = 0; i < nr; i++) {
> +      if (src[i].addr > 0x100000)
> +        continue;
> +
> +      src[i].type = 0;
> +      src[i].size = 0;
> +      src[i].addr = -1ULL;
> +    }
> +
> +    /* Find the lowest and highest entry in E820, skipping over
> +     * undersired entries. */
> +    start = -1ULL;
> +    last = 0;
> +    for (i = 0; i < nr; i++) {
> +        if ((src[i].type == E820_RAM) ||
> +            (src[i].type == E820_UNUSABLE) ||
> +            (src[i].type == 0)) 
> +             continue;
> +
> +            start = src[i].addr < start ? src[i].addr : start;
> +            last = src[i].addr + src[i].size > last ?
> +                    src[i].addr + src[i].size > last : last;
> +    }
> +    if (start > 1024)
> +      start_kb = start >> 10;
> +
> +    /* Add the memory RAM region for the guest */
> +    e820[idx].addr = 0;
> +    e820[idx].size = (uint64_t)map_limitkb << 10;
> +    e820[idx].type = E820_RAM;
> +
> +    /* .. and trim if neccessary */
> +    if (start_kb && map_limitkb > start_kb) {
> +        delta_kb = map_limitkb - start_kb;
> +        if (delta_kb)
> +          e820[idx].size -= (uint64_t)(delta_kb << 10);
> +    }
> +    /* Note: We don't touch balloon_kb here. Will add it at the end. */
> +    ram_end = e820[idx].addr + e820[idx].size;
> +    idx ++;
> +
> +    LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Memory: %ldkB End of RAM: 0x%lx (PFN) 
> " \
> +               "Delta: %ldkB, PCI start: %ldkB (0x%lx PFN), Balloon %ldkB\n",
> +             map_limitkb, ram_end >> 12, delta_kb, start_kb ,start >> 12,
> +                balloon_kb);
> +
> +    /* Check if there is a region between ram_end and start. */
> +    if (start > ram_end) {
> +      /* .. and if not present, add it in. This is to guard against
> +       the Linux guest assuming that the gap between the end of
> +       RAM region and the start of the E820_[ACPI,NVS,RESERVED]
> +       is PCI I/O space. Which it certainly is _not_. */
> +      e820[idx].type = E820_UNUSABLE;
> +      e820[idx].addr = ram_end;
> +      e820[idx].size = start - ram_end;
> +      idx++;
> +    } 
> +     /* Almost done: copy them over, ignoring the undesireable ones */
> +     for (i = 0; i < nr; i++) {
> +            if ((src[i].type == E820_RAM) ||
> +             (src[i].type == E820_UNUSABLE) ||
> +              (src[i].type == 0))
> +             continue;
> +            e820[idx].type = src[i].type;
> +            e820[idx].addr = src[i].addr;
> +            e820[idx].size = src[i].size;
> +            idx++;
> +     }
> +
> +     /* At this point we have the mapped RAM + E820 entries from src. */
> +     if (balloon_kb) {
> +        /* and if we truncated the RAM region, then add it to the end. */
> +        e820[idx].type = E820_RAM;
> +        e820[idx].addr = (uint64_t)(1ULL << 32) > last ? (uint64_t)(1ULL << 
> 32) : last;
> +        /* also add the balloon memory to the end. */
> +        e820[idx].size = (uint64_t)(delta_kb << 10) + (uint64_t)(balloon_kb 
> << 10);
> +        idx++;
> +
> +    }
> +    nr = idx;
> +
> +    for (i = 0; i < nr; i++) {
> +      LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, ":%s\t[%lx -> %lx]",
> +     e820[i].type == E820_RAM ? "RAM " :
> +     (e820[i].type == E820_RESERVED ? "RSV " :
> +      e820[i].type == E820_ACPI ? "ACPI" :
> +         (e820[i].type == E820_NVS ? "NVS " :
> +          (e820[i].type == E820_UNUSABLE ? "UNU " : "----"))),
> +          e820[i].addr >> 12,
> +         (e820[i].addr + e820[i].size) >> 12);
> +    }
> +
> +    /* Done: copy the sanitized version. */
> +    *nr_entries = nr;
> +    memcpy(src, e820, nr * sizeof(struct e820entry));
> +    return 0;
> +}
> +      
> +
> +int libxl__e820_alloc(libxl_ctx *ctx, uint32_t domid, libxl_domain_config 
> *d_config)
> +{
> +    int rc;
> +    uint32_t nr;
> +    struct e820entry map[E820MAX];
> +    libxl_domain_build_info *b_info;
> +
> +    if (d_config == NULL || d_config->c_info.hvm)
> +      return ERROR_INVAL;
> +
> +    b_info = &d_config->b_info;
> +    if (b_info->u.pv.no_machine_e820)
> +      return ERROR_INVAL;
> +
> +    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
> +    if (rc < 0) {
> +        errno = rc;
> +        return ERROR_FAIL;
> +    }
> +    nr = rc;
> +    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
> +                       (b_info->max_memkb - b_info->target_memkb) +
> +                       b_info->u.pv.slack_memkb);
> +    if (rc)
> +        return ERROR_FAIL;
> +
> +    rc = xc_domain_set_memory_map(ctx->xch, domid, map, nr);
> +
> +    if (rc < 0) {
> +        errno  = rc;
> +        return ERROR_FAIL;
> +    }
> +    return 0;
> +}
> diff -r 4527044d3d1c -r 945a34bcc2bd tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c        Tue Apr 12 09:59:59 2011 -0400
> +++ b/tools/libxl/xl_cmdimpl.c        Tue Apr 12 10:04:41 2011 -0400
> @@ -371,6 +371,7 @@ static void printf_info(int domid,
>          printf("\t\t\t(kernel %s)\n", b_info->kernel.path);
>          printf("\t\t\t(cmdline %s)\n", b_info->u.pv.cmdline);
>          printf("\t\t\t(ramdisk %s)\n", b_info->u.pv.ramdisk.path);
> +        printf("\t\t\t(no_machine_e820 %d)\n", b_info->u.pv.no_machine_e820);
>          printf("\t\t)\n");
>      }
>      printf("\t)\n");
> @@ -1025,7 +1026,9 @@ skip_vfb:
>              if (!libxl_device_pci_parse_bdf(&ctx, pcidev, buf))
>                  d_config->num_pcidevs++;
>          }
> -    }
> +    } else
> +      if (!c_info->hvm)
> +        b_info->u.pv.no_machine_e820 = 1;
>  
>      switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) {
>      case 0:



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>