[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for 4.5 v6 1/1] Add mmio_hole_size



On Fri, Oct 03, 2014 at 09:48:38AM -0400, Don Slutz wrote:
> If you add enough PCI devices then all mmio may not fit below 4G
> which may not be the layout the user wanted. This allows you to
> increase the below 4G address space that PCI devices can use and
> therefore in more cases not have any mmio that is above 4G.
> 
> There are real PCI cards that do not support mmio over 4G, so if you
> want to emulate them precisely, you may also need to increase the
> space below 4G for them. There are drivers for these cards that also
> do not work if they have their mmio space mapped above 4G.
> 
> This allows growing the MMIO hole to the size needed.
> 
> This may help with using pci passthru and HVM.

Ha! It definitly helps in some cases with GPUs.
> 
> Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>
> ---
> v6:
>     I think we should get rid of xc_hvm_build_with_hole() entirely.
>       Done.
>     Add text describing restrictions on hole size
>       Done.
>     "<=", to be consistent with the check in
>         libxl__domain_create_info_setdefault ()
>       Done
> 
> v5:
>   Add LIBXL_HAVE_BUILDINFO_HVM_MMIO_HOLE_SIZE.
> 
> v4:
>     s/pci_hole_min_size/mmio_hole_size/
>     s/HVM_BELOW_4G_RAM_END/HVM_BELOW_4G_MMIO_START/
>     Add ignore to hvmloader message
>     Adjust commit message    
>     Dropped min; stopped mmio_hole_size changing in hvmloader
> 
> 
>  docs/man/xl.cfg.pod.5             | 16 +++++++++
>  tools/firmware/hvmloader/config.h |  1 -
>  tools/firmware/hvmloader/pci.c    | 71 
> +++++++++++++++++++++++++++------------
>  tools/libxl/libxl.h               |  5 +++
>  tools/libxl/libxl_create.c        | 29 ++++++++++++----
>  tools/libxl/libxl_dm.c            | 24 +++++++++++--
>  tools/libxl/libxl_dom.c           |  8 +++++
>  tools/libxl/libxl_types.idl       |  1 +
>  tools/libxl/xl_cmdimpl.c          | 12 +++++++
>  9 files changed, 136 insertions(+), 31 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 8bba21c..0a870d2 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -1111,6 +1111,22 @@ wallclock (i.e., real) time.
>  
>  =back
>  
> +=head3 Memory layout
> +
> +=over 4
> +
> +=item B<mmio_hole_size=BYTES>
> +
> +Specifies the size the MMIO hole below 4GiB will be.  Only valid for
> +device_model_version = "qemu-xen".  Note: this requires QEMU 2.1.0
> +or later.

This patch depends on 'max-ram-below-4g' being in QEMU upstream.
Has that been codified/Acked by the maintainers there?
(It would be rather cumbersome if this changed).

Is there a link to the latest patch-set?

Naturally to use this with Xen 4.5 one would have to replace
the qemu-xen that we ship with it, with an upstream one.

I believe Fedora is an distro that does that (as in
it builds using the same QEMU that KVM and Xen are using).
Are there any other ones?


I am conflicted about this as:
 - Internally (Oracel) we have an usage for this and at some
   point developed something quite similar, so I am partially
   biased to this.

 - This by itself won't work with the version of QEMU-xen that
   is going to be shipped in Xen 4.5. Which means it is a bit
   of an dead code - but then there are some patches that we
   put in Xen 4.5 that were cleanups to help with further work.
   And there are also pieces of code in the hypervisor
   which don't have corresponding code in the toolstack.

 - The 'max-ram-below-4g' not being codified makes me a bit
   uneasy - as we would have to alter this when your patches
   make it in QEMU v2.1 (or later).

On the patch itself:
 - It is isolated. It won't be run by the majority of users - hence
   any bugs that it might cause are in the common code. There
   does not seem much..

 - It needs an Ack or Reviewed by the toolstack maintainers
   and also from George (who touched the hvmloader last time).


> +
> +Cannot be smaller than 256MiB. Cannot be larger than 4GiB.
> +
> +Known good large value is 3GiB (3221225472).
> +
> +=back
> +
>  =head3 Support for Paravirtualisation of HVM Guests
>  
>  The following options allow Paravirtualised features (such as devices)
> diff --git a/tools/firmware/hvmloader/config.h 
> b/tools/firmware/hvmloader/config.h
> index 80bea46..b838cf9 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -53,7 +53,6 @@ extern struct bios_config ovmf_config;
>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>  
>  /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
> -#define PCI_MEM_START       0xf0000000
>  #define PCI_MEM_END         0xfc000000
>  
>  extern unsigned long pci_mem_start, pci_mem_end;
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 3712988..4e8d803 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -28,9 +28,10 @@
>  #include <xen/memory.h>
>  #include <xen/hvm/ioreq.h>
>  #include <xen/hvm/hvm_xs_strings.h>
> +#include <xen/hvm/e820.h>
>  #include <stdbool.h>
>  
> -unsigned long pci_mem_start = PCI_MEM_START;
> +unsigned long pci_mem_start = HVM_BELOW_4G_MMIO_START;
>  unsigned long pci_mem_end = PCI_MEM_END;
>  uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>  
> @@ -59,6 +60,7 @@ void pci_setup(void)
>          uint64_t bar_sz;
>      } *bars = (struct bars *)scratch_start;
>      unsigned int i, nr_bars = 0;
> +    uint64_t mmio_hole_size = 0;
>  
>      const char *s;
>      /*
> @@ -86,6 +88,10 @@ void pci_setup(void)
>      printf("Relocating guest memory for lowmem MMIO space %s\n",
>             allow_memory_relocate?"enabled":"disabled");
>  
> +    s = xenstore_read("platform/mmio_hole_size", NULL);
> +    if ( s )
> +        mmio_hole_size = strtoll(s, NULL, 0);
> +
>      /* Program PCI-ISA bridge with appropriate link routes. */
>      isa_irq = 0;
>      for ( link = 0; link < 4; link++ )
> @@ -237,26 +243,49 @@ void pci_setup(void)
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }
>  
> -    /*
> -     * At the moment qemu-xen can't deal with relocated memory regions.
> -     * It's too close to the release to make a proper fix; for now,
> -     * only allow the MMIO hole to grow large enough to move guest memory
> -     * if we're running qemu-traditional.  Items that don't fit will be
> -     * relocated into the 64-bit address space.
> -     *
> -     * This loop now does the following:
> -     * - If allow_memory_relocate, increase the MMIO hole until it's
> -     *   big enough, or until it's 2GiB
> -     * - If !allow_memory_relocate, increase the MMIO hole until it's
> -     *   big enough, or until it's 2GiB, or until it overlaps guest
> -     *   memory
> -     */
> -    while ( (mmio_total > (pci_mem_end - pci_mem_start)) 
> -            && ((pci_mem_start << 1) != 0)
> -            && (allow_memory_relocate
> -                || (((pci_mem_start << 1) >> PAGE_SHIFT)
> -                    >= hvm_info->low_mem_pgend)) )
> -        pci_mem_start <<= 1;
> +    if ( mmio_hole_size )
> +    {
> +        uint64_t max_ram_below_4g = (1ULL << 32) - mmio_hole_size;
> +
> +        if ( max_ram_below_4g > HVM_BELOW_4G_MMIO_START )
> +        {
> +            printf("max_ram_below_4g=0x"PRIllx
> +                   " too big for mmio_hole_size=0x"PRIllx
> +                   " has been ignored.\n",
> +                   PRIllx_arg(max_ram_below_4g),
> +                   PRIllx_arg(mmio_hole_size));
> +        }
> +        else
> +        {
> +            pci_mem_start = max_ram_below_4g;
> +            printf("pci_mem_start=0x%lx (was 0x%x) for mmio_hole_size=%lu\n",
> +                   pci_mem_start, HVM_BELOW_4G_MMIO_START,
> +                   (long)mmio_hole_size);
> +        }
> +    }
> +    else
> +    {
> +        /*
> +         * At the moment qemu-xen can't deal with relocated memory regions.
> +         * It's too close to the release to make a proper fix; for now,
> +         * only allow the MMIO hole to grow large enough to move guest memory
> +         * if we're running qemu-traditional.  Items that don't fit will be
> +         * relocated into the 64-bit address space.
> +         *
> +         * This loop now does the following:
> +         * - If allow_memory_relocate, increase the MMIO hole until it's
> +         *   big enough, or until it's 2GiB
> +         * - If !allow_memory_relocate, increase the MMIO hole until it's
> +         *   big enough, or until it's 2GiB, or until it overlaps guest
> +         *   memory
> +         */
> +        while ( (mmio_total > (pci_mem_end - pci_mem_start))
> +                && ((pci_mem_start << 1) != 0)
> +                && (allow_memory_relocate
> +                    || (((pci_mem_start << 1) >> PAGE_SHIFT)
> +                        >= hvm_info->low_mem_pgend)) )
> +            pci_mem_start <<= 1;
> +    }
>  
>      if ( mmio_total > (pci_mem_end - pci_mem_start) )
>      {
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 2700cc1..eb29037 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -158,6 +158,11 @@
>  #define LIBXL_BUILDINFO_HVM_VIRIDIAN_ENABLE_DISABLE_WIDTH 64
>  
>  /*
> + * libxl_domain_build_info has the u.hvm.mmio_hole_size field.
> + */
> +#define LIBXL_HAVE_BUILDINFO_HVM_MMIO_HOLE_SIZE 1
> +
> +/*
>   * libxl ABI compatibility
>   *
>   * The only guarantee which libxl makes regarding ABI compatibility
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 8b82584..3737148 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -23,6 +23,7 @@
>  #include <xc_dom.h>
>  #include <xenguest.h>
>  #include <xen/hvm/hvm_info_table.h>
> +#include <xen/hvm/e820.h>
>  
>  int libxl__domain_create_info_setdefault(libxl__gc *gc,
>                                           libxl_domain_create_info *c_info)
> @@ -428,13 +429,26 @@ int libxl__domain_build(libxl__gc *gc,
>          vments[4] = "start_time";
>          vments[5] = libxl__sprintf(gc, "%lu.%02d", 
> start_time.tv_sec,(int)start_time.tv_usec/10000);
>  
> -        localents = libxl__calloc(gc, 7, sizeof(char *));
> -        localents[0] = "platform/acpi";
> -        localents[1] = libxl_defbool_val(info->u.hvm.acpi) ? "1" : "0";
> -        localents[2] = "platform/acpi_s3";
> -        localents[3] = libxl_defbool_val(info->u.hvm.acpi_s3) ? "1" : "0";
> -        localents[4] = "platform/acpi_s4";
> -        localents[5] = libxl_defbool_val(info->u.hvm.acpi_s4) ? "1" : "0";
> +        localents = libxl__calloc(gc, 9, sizeof(char *));
> +        i = 0;
> +        localents[i++] = "platform/acpi";
> +        localents[i++] = libxl_defbool_val(info->u.hvm.acpi) ? "1" : "0";
> +        localents[i++] = "platform/acpi_s3";
> +        localents[i++] = libxl_defbool_val(info->u.hvm.acpi_s3) ? "1" : "0";
> +        localents[i++] = "platform/acpi_s4";
> +        localents[i++] = libxl_defbool_val(info->u.hvm.acpi_s4) ? "1" : "0";
> +        if (info->u.hvm.mmio_hole_size) {
> +            uint64_t max_ram_below_4g =
> +                (1ULL << 32) - info->u.hvm.mmio_hole_size;
> +
> +            if (max_ram_below_4g <= HVM_BELOW_4G_MMIO_START) {
> +                localents[i++] = "platform/mmio_hole_size";
> +                localents[i++] =
> +                    libxl__sprintf(gc, "%"PRIu64,
> +                                   info->u.hvm.mmio_hole_size);
> +            }
> +        }
> +        assert(i < 9);

Why? Please include an comment explaining why?
>  
>          break;
>      case LIBXL_DOMAIN_TYPE_PV:
> @@ -460,6 +474,7 @@ int libxl__domain_build(libxl__gc *gc,
>              vments[i++] = "image/cmdline";
>              vments[i++] = (char *) state->pv_cmdline;
>          }
> +        assert(i < 11);

Why? Please include an comment explaining why.
>  
>          break;
>      default:
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 0018113..be22a0c 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -18,6 +18,7 @@
>  #include "libxl_osdeps.h" /* must come before any other headers */
>  
>  #include "libxl_internal.h"
> +#include <xen/hvm/e820.h>
>  
>  static const char *libxl_tapif_script(libxl__gc *gc)
>  {
> @@ -415,6 +416,7 @@ static char ** 
> libxl__build_device_model_args_new(libxl__gc *gc,
>      const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config);
>      const libxl_sdl_info *sdl = dm_sdl(guest_config);
>      const char *keymap = dm_keymap(guest_config);
> +    char *machinearg;
>      flexarray_t *dm_args;
>      int i;
>      uint64_t ram_size;
> @@ -696,10 +698,28 @@ static char ** 
> libxl__build_device_model_args_new(libxl__gc *gc,
>              /* Switching here to the machine "pc" which does not add
>               * the xen-platform device instead of the default "xenfv" 
> machine.
>               */
> -            flexarray_append(dm_args, "pc,accel=xen");
> +            machinearg = libxl__sprintf(gc, "pc,accel=xen");
>          } else {
> -            flexarray_append(dm_args, "xenfv");
> +            machinearg = libxl__sprintf(gc, "xenfv");
>          }
> +        if (b_info->u.hvm.mmio_hole_size) {
> +            uint64_t max_ram_below_4g = (1ULL << 32) -
> +                b_info->u.hvm.mmio_hole_size;
> +
> +            if (max_ram_below_4g > HVM_BELOW_4G_MMIO_START) {
> +                LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> +                           "max-ram-below-4g=%"PRIu64
> +                           " (mmio_hole_size=%"PRIu64
> +                           ") too big > %u ignored.\n",
> +                           max_ram_below_4g,
> +                           b_info->u.hvm.mmio_hole_size,
> +                           HVM_BELOW_4G_MMIO_START);
> +            } else {
> +                machinearg = libxl__sprintf(gc, 
> "%s,max-ram-below-4g=%"PRIu64,
> +                                            machinearg, max_ram_below_4g);
> +            }
> +        }
> +        flexarray_append(dm_args, machinearg);
>          for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; i++)
>              flexarray_append(dm_args, b_info->extra_hvm[i]);
>          break;
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index d63ae1b..09dd008 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -23,6 +23,7 @@
>  #include <xc_dom.h>
>  #include <xen/hvm/hvm_info_table.h>
>  #include <xen/hvm/hvm_xs_strings.h>
> +#include <xen/hvm/e820.h>
>  
>  libxl_domain_type libxl__domain_type(libxl__gc *gc, uint32_t domid)
>  {
> @@ -800,6 +801,13 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>      args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
>      args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 
> 10;
>      args.claim_enabled = libxl_defbool_val(info->claim_mode);
> +    if (info->u.hvm.mmio_hole_size) {
> +        uint64_t max_ram_below_4g = (1ULL << 32) -
> +            info->u.hvm.mmio_hole_size;
> +
> +        if (max_ram_below_4g < HVM_BELOW_4G_MMIO_START)
> +            args.mmio_size = info->u.hvm.mmio_hole_size;
> +    }
>      if (libxl__domain_firmware(gc, info, &args)) {
>          LOG(ERROR, "initializing domain firmware failed");
>          goto out;
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index bbb03e2..2a71e9a 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -391,6 +391,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>                                         ("timeoffset",       string),
>                                         ("hpet",             libxl_defbool),
>                                         ("vpt_align",        libxl_defbool),
> +                                       ("mmio_hole_size",   uint64),
>                                         ("timer_mode",       
> libxl_timer_mode),
>                                         ("nested_hvm",       libxl_defbool),
>                                         ("smbios_firmware",  string),
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index c734f79..0a70da6 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -34,6 +34,7 @@
>  #include <ctype.h>
>  #include <inttypes.h>
>  #include <limits.h>
> +#include <xen/hvm/e820.h>
>  
>  #include "libxl.h"
>  #include "libxl_utils.h"
> @@ -1111,6 +1112,17 @@ static void parse_config_data(const char 
> *config_source,
>              exit(-ERROR_FAIL);
>          }
>  
> +        if (!xlu_cfg_get_long(config, "mmio_hole_size", &l, 0)) {
> +            b_info->u.hvm.mmio_hole_size = (uint64_t) l;
> +            if (b_info->u.hvm.mmio_hole_size < HVM_BELOW_4G_MMIO_LENGTH ||

Oh boy, this check for 256MB is surely obfuscated by the macros.

Could you include a comment saying that HVM_BELOW_4G_MMIO_LENGTH resolves
to 256MB please?

> +                b_info->u.hvm.mmio_hole_size > HVM_BELOW_4G_MMIO_START) {
> +                fprintf(stderr, "ERROR: invalid value 0x%"PRIx64" (%"PRIu64
> +                        ") for \"mmio_hole_size\"\n",
> +                        b_info->u.hvm.mmio_hole_size,
> +                        b_info->u.hvm.mmio_hole_size);
> +                exit (1);
> +            }
> +        }
>          if (!xlu_cfg_get_long(config, "timer_mode", &l, 1)) {
>              const char *s = libxl_timer_mode_to_string(l);
>              fprintf(stderr, "WARNING: specifying \"timer_mode\" as an 
> integer is deprecated. "
> -- 
> 1.8.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.