WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [PATCH 3 of 4] libxl: Convert E820_UNUSABLE and E820_RAM to

To: xen-devel@xxxxxxxxxxxxxxxxxxx, Ian.Jackson@xxxxxxxxxxxxx, stefano.stabellini@xxxxxxxxxxxxx, Ian.Campbell@xxxxxxxxxx
Subject: [Xen-devel] [PATCH 3 of 4] libxl: Convert E820_UNUSABLE and E820_RAM to E820_UNUSABLE as appropriate
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Tue, 24 May 2011 11:24:55 -0400
Cc: konrad.wilk@xxxxxxxxxx
Delivery-date: Tue, 24 May 2011 08:38:52 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <patchbomb.1306250692@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <patchbomb.1306250692@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mercurial-patchbomb/1.8.1
# HG changeset patch
# User Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
# Date 1306250649 14400
# Node ID 5a781e6c9d4c96bd097c46b0b200df10a16fe32c
# Parent  d3fa063a07f070a0ba9a86dea9f11a40ff974dcc
libxl: Convert E820_UNUSABLE and E820_RAM to E820_UNUSABLE as appropriate.

Most machines after the RAM regions in the e802 have a couple of
E820_RESERVED, with E820_ACPI and E820_NVS. On some Intel machines, the
E820 looks like swiss cheese:

(XEN) Initial Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009d000 (usable)
(XEN)  000000000009d000 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000009cf66000 (usable)
(XEN)  000000009cf66000 - 000000009d102000 (ACPI NVS)
(XEN)  000000009d102000 - 000000009f6bd000 (usable)  <--
(XEN)  000000009f6bd000 - 000000009f6bf000 (reserved)
(XEN)  000000009f6bf000 - 000000009f714000 (usable)  <--
(XEN)  000000009f714000 - 000000009f7bf000 (ACPI NVS)
(XEN)  000000009f7bf000 - 000000009f7e0000 (usable)  <--
(XEN)  000000009f7e0000 - 000000009f7ff000 (ACPI data)
(XEN)  000000009f7ff000 - 000000009f800000 (usable)  <--
(XEN)  000000009f800000 - 00000000a0000000 (reserved)
(XEN)  00000000a0000000 - 00000000b0000000 (reserved)
(XEN)  00000000fc000000 - 00000000fd000000 (reserved)
(XEN)  00000000ffe00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000160000000 (usable)

Which means we have to pay attention to the E820_RAM that are
between the E820_[ACPI,NVS,RESERVED]. If we remove those
E820_RAM (b/c the amount of memory passed to the guest
is less that where those E820 regions reside) from the E820, the
Linux kernel interprets those "gaps" as PCI I/O space.
This is what we are currently doing.

This can be disastrous if we pass in an Intel IGD card which tries
to use the first available PCI I/O space - and ends up
using the MFNs which are actually RAM instead of being the
PCI I/O space.

To make this work, we convert all E820_RAM that are above
the 'target_kb' (those that overlap the 'target_kb'
are truncated appropriately) to be E820_UNUSABLE. We also limit this
alternation up to 4GB. This means that an E820 for a guest
from this (target_kb=1024, maxmem=2048):

[    0.000000] Set 405658 page(s) to 1-1 mapping.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
[    0.000000]  Xen: 0000000040000000 - 000000009cf66000 (unusable)
[    0.000000]  Xen: 000000009cf66000 - 000000009d102000 (ACPI NVS)
[    0.000000]  Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
[    0.000000]  Xen: 000000009f714000 - 000000009f7bf000 (ACPI NVS)
[    0.000000]  Xen: 000000009f7e0000 - 000000009f7ff000 (ACPI data)
[    0.000000]  Xen: 000000009f800000 - 00000000b0000000 (reserved)
[    0.000000]  Xen: 00000000fc000000 - 00000000fd000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 00000000fec01000 (reserved)
[    0.000000]  Xen: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  Xen: 00000000ffe00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 0000000140800000 (usable)


Will look as so:

[    0.000000] Set 395880 page(s) to 1-1 mapping.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
[    0.000000]  Xen: 0000000040000000 - 000000009cf66000 (unusable)
[    0.000000]  Xen: 000000009cf66000 - 000000009d102000 (ACPI NVS)
[    0.000000]  Xen: 000000009d102000 - 000000009f6bd000 (unusable)
[    0.000000]  Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
[    0.000000]  Xen: 000000009f6bf000 - 000000009f714000 (unusable)
[    0.000000]  Xen: 000000009f714000 - 000000009f7bf000 (ACPI NVS)
[    0.000000]  Xen: 000000009f7bf000 - 000000009f7e0000 (unusable)
[    0.000000]  Xen: 000000009f7e0000 - 000000009f7ff000 (ACPI data)
[    0.000000]  Xen: 000000009f7ff000 - 000000009f800000 (unusable)
[    0.000000]  Xen: 000000009f800000 - 00000000b0000000 (reserved)
[    0.000000]  Xen: 00000000fc000000 - 00000000fd000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 00000000fec01000 (reserved)
[    0.000000]  Xen: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  Xen: 00000000ffe00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 0000000140800000 (usable)

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

diff -r d3fa063a07f0 -r 5a781e6c9d4c tools/libxl/libxl_pci.c
--- a/tools/libxl/libxl_pci.c   Tue May 24 11:24:07 2011 -0400
+++ b/tools/libxl/libxl_pci.c   Tue May 24 11:24:09 2011 -0400
@@ -1127,21 +1127,98 @@ static int e820_sanitize(libxl_ctx *ctx,
                map_limitkb, ram_end >> 12, delta_kb, start_kb ,start >> 12,
                balloon_kb);
 
+
+    /* This whole code below is to guard against if the Intel IGD is passed 
into
+     * the guest. If we don't pass in IGD, this whole code can be ignored.
+     *
+     * The reason for this code is that Intel boxes fill their E820 with
+     * E820_RAM amongst E820_RESERVED and we can't just ditch those E820_RAM.
+     * That is b/c any "gaps" in the E820 is considered PCI I/O space by
+     * Linux and it would be utilized by the Intel IGD as I/O space while
+     * in reality it was an RAM region.
+     *
+     * What this means is that we have to walk the E820 and for any region
+     * that is RAM and below 4GB and above ram_end, needs to change its type
+     * to E820_UNUSED. We also need to move some of the E820_RAM regions if
+     * the overlap with ram_end. */
+    for (i = 0; i < nr; i++) {
+       uint64_t end = src[i].addr + src[i].size;
+
+       /* We don't care about E820_UNUSABLE, but we need to
+         * change the type to zero b/c the loop after this
+         * sticks E820_UNUSABLE on the guest's E820 but ignores
+         * the ones with type zero. */
+        if ((src[i].type == E820_UNUSABLE) ||
+           /* Any region that is within the "RAM region" can
+             * be safely ditched. */
+            (end < ram_end)) {
+                src[i].type = 0;
+                continue;
+        }
+
+        /* Look only at RAM regions. */
+       if (src[i].type != E820_RAM)
+            continue;
+
+        /* We only care about RAM regions below 4GB. */
+        if (src[i].addr >= (1ULL<<32))
+            continue;
+
+       /* E820_RAM overlaps with our RAM region. Move it */
+       if (src[i].addr < ram_end) {
+            uint64_t delta;
+
+            src[i].type = E820_UNUSABLE;
+            delta = ram_end - src[i].addr;
+            /* The end < ram_end should weed this out */
+            if (src[i].size - delta < 0)
+                src[i].type = 0;
+            else {
+                src[i].size -= delta;
+                src[i].addr = ram_end;
+            }
+            if (src[i].addr + src[i].size != end) {
+                /* We messed up somewhere */
+                src[i].type = 0;
+                LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Computed E820 
wrongly. Continuing on.");
+            }
+        }
+        /* Lastly, convert the RAM to UNSUABLE. Look in the Linux kernel
+           at git commit 2f14ddc3a7146ea4cd5a3d1ecd993f85f2e4f948
+            "xen/setup: Inhibit resource API from using System RAM E820
+           gaps as PCI mem gaps" for full explanation. */
+        if (end > ram_end)
+            src[i].type = E820_UNUSABLE;
+    }
+
     /* Check if there is a region between ram_end and start. */
     if (start > ram_end) {
+        int add_unusable = 1;
+        for (i = 0; i < nr && add_unusable; i++) {
+            if (src[i].type != E820_UNUSABLE)
+                continue;
+            if (ram_end != src[i].addr)
+                continue;
+            if (start != src[i].addr + src[i].size) {
+                /* there is one, adjust it */
+                src[i].size = start - src[i].addr;
+            }
+            add_unusable = 0;
+        }
         /* .. and if not present, add it in. This is to guard against
-          the Linux guest assuming that the gap between the end of
-          RAM region and the start of the E820_[ACPI,NVS,RESERVED]
-          is PCI I/O space. Which it certainly is _not_. */
-        e820[idx].type = E820_UNUSABLE;
-        e820[idx].addr = ram_end;
-        e820[idx].size = start - ram_end;
-        idx++;
+           the Linux guest assuming that the gap between the end of
+           RAM region and the start of the E820_[ACPI,NVS,RESERVED]
+           is PCI I/O space. Which it certainly is _not_. */
+        if (add_unusable) {
+            e820[idx].type = E820_UNUSABLE;
+            e820[idx].addr = ram_end;
+            e820[idx].size = start - ram_end;
+            idx++;
+        }
     }
     /* Almost done: copy them over, ignoring the undesireable ones */
     for (i = 0; i < nr; i++) {
         if ((src[i].type == E820_RAM) ||
-           (src[i].type == E820_UNUSABLE) ||
            (src[i].type == 0))
            continue;
 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>