[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node



Hi Wei,

On 28/08/2021 04:56, Wei Chen wrote:
-----Original Message-----
From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Sent: 2021年8月28日 9:06
To: Wei Chen <Wei.Chen@xxxxxxx>
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; sstabellini@xxxxxxxxxx; julien@xxxxxxx;
jbeulich@xxxxxxxx; Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
device tree memory node

On Wed, 11 Aug 2021, Wei Chen wrote:
Memory blocks' NUMA ID information is stored in device tree's
memory nodes as "numa-node-id". We need a new helper to parse
and verify this ID from memory nodes.

In order to support memory affinity in later use, the valid
memory ranges and NUMA ID will be saved to tables.

Signed-off-by: Wei Chen <wei.chen@xxxxxxx>
---
  xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++
  1 file changed, 130 insertions(+)

diff --git a/xen/arch/arm/numa_device_tree.c
b/xen/arch/arm/numa_device_tree.c
index 37cc56acf3..bbe081dcd1 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -20,11 +20,13 @@
  #include <xen/init.h>
  #include <xen/nodemask.h>
  #include <xen/numa.h>
+#include <xen/libfdt/libfdt.h>
  #include <xen/device_tree.h>
  #include <asm/setup.h>

  s8 device_tree_numa = 0;
  static nodemask_t processor_nodes_parsed __initdata;
+static nodemask_t memory_nodes_parsed __initdata;

  static int srat_disabled(void)
  {
@@ -55,6 +57,79 @@ static int __init
dtb_numa_processor_affinity_init(nodeid_t node)
      return 0;
  }

+/* Callback for parsing of the memory regions affinity */
+static int __init dtb_numa_memory_affinity_init(nodeid_t node,
+                                paddr_t start, paddr_t size)
+{
+    struct node *nd;
+    paddr_t end;
+    int i;
+
+    if ( srat_disabled() )
+        return -EINVAL;
+
+    end = start + size;
+    if ( num_node_memblks >= NR_NODE_MEMBLKS )
+    {
+        dprintk(XENLOG_WARNING,
+                "Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
+        bad_srat();
+        return -EINVAL;
+    }
+
+    /* It is fine to add this area to the nodes data it will be used
later */
+    i = conflicting_memblks(start, end);
+    /* No conflicting memory block, we can save it for later usage */;
+    if ( i < 0 )
+        goto save_memblk;
+
+    if ( memblk_nodeid[i] == node ) {
+        /*
+         * Overlaps with other memblk in the same node, warning here.
+         * This memblk will be merged with conflicted memblk later.
+         */
+        printk(XENLOG_WARNING
+               "DT: NUMA NODE %u (%"PRIx64
+               "-%"PRIx64") overlaps with itself (%"PRIx64"-
%"PRIx64")\n",
+               node, start, end,
+               node_memblk_range[i].start, node_memblk_range[i].end);
+    } else {
+        /*
+         * Conflict with memblk in other node, this is an error.
+         * The NUMA information is invalid, NUMA will be turn off.
+         */
+        printk(XENLOG_ERR
+               "DT: NUMA NODE %u (%"PRIx64"-%"
+               PRIx64") overlaps with NODE %u (%"PRIx64"-%"PRIx64")\n",
+               node, start, end, memblk_nodeid[i],
+               node_memblk_range[i].start, node_memblk_range[i].end);
+        bad_srat();
+        return -EINVAL;
+    }
+
+save_memblk:
+    nd = &nodes[node];
+    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
+        nd->start = start;
+        nd->end = end;
+    } else {
+        if ( start < nd->start )
+            nd->start = start;
+        if ( nd->end < end )
+            nd->end = end;
+    }
+
+    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
+           node, start, end);
+
+    node_memblk_range[num_node_memblks].start = start;
+    node_memblk_range[num_node_memblks].end = end;
+    memblk_nodeid[num_node_memblks] = node;
+    num_node_memblks++;


Is it possible to have non-contigous ranges of memory for a single NUMA
node?

Looking at the DT bindings and Linux implementation, it seems possible.
Here, it seems that node_memblk_range/memblk_nodeid could handle it,
but nodes couldn't.

Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI allow
non-contiguous ranges of memory for a single NUMA node too?

I couldn't find any restriction for ACPI. Although, I only briefly looked at the spec.

If yes, I think
this will affect x86 ACPI NUMA too. In next version, we plan to merge
dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init into a
neutral function. So we can fix them at the same time.

If not, maybe we have to keep the diversity for dtb and ACPI here.

I am not entirely sure what you mean. Are you saying if ACPI doesn't allow non-contiguous ranges of memory, then we should keep the implementation separated?

If so, then I disagree with that. It is fine to have code that supports more than what a firmware table supports. The main benefit is less code and therefore less long term maintenance (with the current solution we would need to check both the ACPI and DT implementation if there is a bug in one).

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.