[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 2/3] pc & q35: Add new machine opt max-ram-below-4g



On 06/09/14 10:38, Michael S. Tsirkin wrote:
On Mon, Jun 09, 2014 at 10:20:57AM -0400, Don Slutz wrote:
On 06/08/14 11:40, Michael S. Tsirkin wrote:
On Fri, Jun 06, 2014 at 01:52:05PM -0400, Don Slutz wrote:
This is a pc & q35 only machine opt.  One use is to allow for more
ram in a 32bit guest for example:

-machine pc,max-ram-below-4g=3.75G

If you add enough PCI devices then all mmio for them will not fit
below 4G which may not be the layout the user wanted. This allows
you to increase the below 4G address space that PCI devices can use
(aka decrease ram below 4G) and therefore in more cases not have any
mmio that is above 4G.

For example using "-machine pc,max-ram-below-4g=2G" on the command
line will limit the amount of ram that is below 4G to 2G.

Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>
---
v5:
   Re-work based on:

   https://github.com/imammedo/qemu/commits/memory-hotplug-v11


  hw/i386/pc.c         | 38 ++++++++++++++++++++++++++++++++++++++
  hw/i386/pc_piix.c    | 15 ++++++++++++---
  hw/i386/pc_q35.c     | 15 ++++++++++++---
  include/hw/i386/pc.h |  3 +++
  vl.c                 |  4 ++++
  5 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7cdba10..bccb746 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1644,11 +1644,49 @@ pc_machine_get_hotplug_memory_region_size(Object *obj, 
Visitor *v, void *opaque,
      visit_type_int(v, &value, name, errp);
  }
+static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
+                                         void *opaque, const char *name,
+                                         Error **errp)
+{
+    PCMachineState *pcms = PC_MACHINE(obj);
+    uint64_t value = pcms->max_ram_below_4g;
+
+    visit_type_size(v, &value, name, errp);
+}
+
+static void pc_machine_set_max_ram_below_4g(Object *obj, Visitor *v,
+                                         void *opaque, const char *name,
+                                         Error **errp)
+{
+    PCMachineState *pcms = PC_MACHINE(obj);
+    Error *error = NULL;
+    uint64_t value;
+
+    visit_type_size(v, &value, name, &error);
+    if (error) {
+        error_propagate(errp, error);
+        return;
+    }
+    if (value > (1ULL << 32)) {
+        error_set(&error, ERROR_CLASS_GENERIC_ERROR,
+                  "Machine option 'max-ram-below-4g=%"PRIu64
+                  "' expects size less then or equal to 4G", value);
less than
But the test is greater then.  So "not greater then" is "less then or equal".
Or did you want the test changed?
No, just correcting English: less than, not less then :)


Sigh, will fix.


+        error_propagate(errp, error);
+        return;
+    }
+
+    pcms->max_ram_below_4g = value;
+}
+
  static void pc_machine_initfn(Object *obj)
  {
      object_property_add(obj, PC_MACHINE_MEMHP_REGION_SIZE, "int",
                          pc_machine_get_hotplug_memory_region_size,
                          NULL, NULL, NULL, NULL);
+    object_property_add(obj, PC_MACHINE_MAX_RAM_BELOW_4G,  "size",
+                        pc_machine_get_max_ram_below_4g,
+                        pc_machine_set_max_ram_below_4g,
+                        NULL, NULL, NULL);
  }
  static void pc_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 40f6eaf..25f4727 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -98,6 +98,13 @@ static void pc_init1(MachineState *machine,
      DeviceState *icc_bridge;
      FWCfgState *fw_cfg = NULL;
      PcGuestInfo *guest_info;
+    Object *mo = qdev_get_machine();
+    PCMachineState *pcms = PC_MACHINE(mo);
+    ram_addr_t lowmem = 0xe0000000;
+
+    if (pcms && pcms->max_ram_below_4g) {
Is pcms ever NULL? If yes why?
Not that I know of.  I would be happy to convert this to an assert(pcms).
In fact, PC_MACHINE already includes an assert doesn't it?
So no need to check it everywhere.

You are right.  And I also found:

static void pc_init1(MachineState *machine,
                     int pci_enabled,
                     int kvmclock_enabled)
{
    PCMachineState *pc_machine = PC_MACHINE(machine);
...
    PCMachineState *pcms = PC_MACHINE(mo);
...

and so I will drop the add of pcms.



+        lowmem = pcms->max_ram_below_4g;
+    }
      /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory).
       * If it doesn't, we need to split it in chunks below and above 4G.
@@ -106,8 +113,10 @@ static void pc_init1(MachineState *machine,
       * For old machine types, use whatever split we used historically to avoid
       * breaking migration.
       */
-    if (machine->ram_size >= 0xe0000000) {
-        ram_addr_t lowmem = gigabyte_align ? 0xc0000000 : 0xe0000000;
+    if (machine->ram_size >= lowmem) {
+        if (!(pcms && pcms->max_ram_below_4g) && gigabyte_align) {
+            lowmem = 0xc0000000;
+        }
          above_4g_mem_size = machine->ram_size - lowmem;
          below_4g_mem_size = lowmem;
      } else {
So why do we need gigabyte_align anymore?
Because of qemu 2.0 and the user is not required to specify this option.

Can't we set property to 0xc0000000 by default, and
override for old machine types?
There is a strange compatibility part here.  Since this code includes ram_size 
(see:

http://lists.gnu.org/archive/html/qemu-devel/2014-02/msg05146.html

) and xen has a different default.

So instead of default 0, it would be preferable to set the default to the
actual value, and let user override it.

Or if that's too hard, set max_ram_below_4g instead of setting
gigabyte_align. gigabyte_align switches everywhere is messy
enough, adding max_ram_below_4g into mix is just too messy.


I do not see a way to encode the default since for QEMU 2.0 it depends on the 
specified ram size:

This is intentional.

If we can fit all ram into low memory, because it is less than 3.5G,
we'll do that (pc machine type, q35 numbers are different but logic is
the same).  This way 32bit (+non-PAE) guests can continue to have up to
3.5G memory.

If we can't fit all ram into low memory (thus the guest should be able
to access ram above 4G anyway), then we'll cut off at a gigabyte
boundary (3G for pc machine type).  This way our ram is nicely
gigabyte-aligned and we can get best performance benefits from huge
pages.

The size of the pci hole changing in the second case is only a side
effect, it's not the main reason for the change.

cheers,
  Gerd

So migration of a QEMU 2.0 pc to QEMU 2.1 without gigabyte_align would require 
the user
to specify the correct value of max-ram-below-4g.

When you add xen into the mix I do not see a way to get right of gigabyte_align.

You have 3 cases:

1) old xen (without max-ram-below-4g), QEMU 2.1 or later.
    Will expect that QEMU acts as if max-ram-below-4g=3.75G was specified.
    I.E. gigabyte_align is ignored. Note: xen 4.4 asks for "pc,accel=xen" in 
some cases.


2) new xen (with max-ram-below-4g), QEMU 2.0 or earlier.
    Will expect that QEMU acts as if max-ram-below-4g=3.75G was specified.
    If any other value requested, the error from QEMU is ok.

3) new xen (with max-ram-below-4g), QEMU 2.1 or later.
     Expects that max-ram-below-4g works.


In summary, the migration of QEMU pc-i440fx-2.0 and pc-q35-2.0 guests
for accel=tcg, accel=kvm, accel=xen needs the gigabyte_align mess.


   -Don Slutz



Also, a value that isn't a multiple of 1G will lead to bad
performance for large machines which do have above_4g_mem_size.
Let's detect and print a warning.
Will Do.

    -Don Slutz


@@ -122,7 +131,7 @@ static void pc_init1(MachineState *machine,
      }
      icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
-    object_property_add_child(qdev_get_machine(), "icc-bridge",
+    object_property_add_child(mo, "icc-bridge",
                                OBJECT(icc_bridge), NULL);
      pc_cpus_init(machine->cpu_model, icc_bridge);
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index e28ce40..155cdf1 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -85,6 +85,13 @@ static void pc_q35_init(MachineState *machine)
      PCIDevice *ahci;
      DeviceState *icc_bridge;
      PcGuestInfo *guest_info;
+    Object *mo = qdev_get_machine();
+    PCMachineState *pcms = PC_MACHINE(mo);
+    ram_addr_t lowmem = 0xb0000000;
+
+    if (pcms && pcms->max_ram_below_4g) {
+        lowmem = pcms->max_ram_below_4g;
+    }
      /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
       * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -95,8 +102,10 @@ static void pc_q35_init(MachineState *machine)
       * For old machine types, use whatever split we used historically to avoid
       * breaking migration.
       */
-    if (machine->ram_size >= 0xb0000000) {
-        ram_addr_t lowmem = gigabyte_align ? 0x80000000 : 0xb0000000;
+    if (machine->ram_size >= lowmem) {
+        if (!(pcms && pcms->max_ram_below_4g) && gigabyte_align) {
+            lowmem = 0x800000000;
+        }
          above_4g_mem_size = machine->ram_size - lowmem;
          below_4g_mem_size = lowmem;
      } else {
@@ -111,7 +120,7 @@ static void pc_q35_init(MachineState *machine)
      }
      icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
-    object_property_add_child(qdev_get_machine(), "icc-bridge",
+    object_property_add_child(mo, "icc-bridge",
                                OBJECT(icc_bridge), NULL);
      pc_cpus_init(machine->cpu_model, icc_bridge);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 19530bd..2d8b562 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -32,10 +32,13 @@ struct PCMachineState {
      MemoryRegion hotplug_memory;
      HotplugHandler *acpi_dev;
+
+    uint64_t max_ram_below_4g;
  };
  #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
  #define PC_MACHINE_MEMHP_REGION_SIZE "hotplug-memory-region-size"
+#define PC_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
  /**
   * PCMachineClass:
diff --git a/vl.c b/vl.c
index 5e77a27..cffb9c5 100644
--- a/vl.c
+++ b/vl.c
@@ -382,6 +382,10 @@ static QemuOptsList qemu_machine_opts = {
              .name = "kvm-type",
              .type = QEMU_OPT_STRING,
              .help = "Specifies the KVM virtualization mode (HV, PR)",
+        },{
+            .name = PC_MACHINE_MAX_RAM_BELOW_4G,
+            .type = QEMU_OPT_SIZE,
+            .help = "maximum ram below the 4G boundary (32bit boundary)",
          },
          { /* End of list */ }
      },
--
1.8.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.