|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
Hi Stefano,
> -----Original Message-----
> From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> Sent: 2021年9月9日 6:32
> To: Wei Chen <Wei.Chen@xxxxxxx>
> Cc: Julien Grall <julien@xxxxxxx>; Stefano Stabellini
> <sstabellini@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis
> <Bertrand.Marquis@xxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>
> Subject: RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> device tree memory node
>
> On Wed, 8 Sep 2021, Wei Chen wrote:
> > > > >>> @@ -55,6 +57,79 @@ static int __init
> > > > >> dtb_numa_processor_affinity_init(nodeid_t node)
> > > > >>> return 0;
> > > > >>> }
> > > > >>>
> > > > >>> +/* Callback for parsing of the memory regions affinity */
> > > > >>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> > > > >>> + paddr_t start, paddr_t size)
> > > > >>> +{
> > > > >>> + struct node *nd;
> > > > >>> + paddr_t end;
> > > > >>> + int i;
> > > > >>> +
> > > > >>> + if ( srat_disabled() )
> > > > >>> + return -EINVAL;
> > > > >>> +
> > > > >>> + end = start + size;
> > > > >>> + if ( num_node_memblks >= NR_NODE_MEMBLKS )
> > > > >>> + {
> > > > >>> + dprintk(XENLOG_WARNING,
> > > > >>> + "Too many numa entry, try bigger
> NR_NODE_MEMBLKS
> > > \n");
> > > > >>> + bad_srat();
> > > > >>> + return -EINVAL;
> > > > >>> + }
> > > > >>> +
> > > > >>> + /* It is fine to add this area to the nodes data it will be
> > > used
> > > > >> later */
> > > > >>> + i = conflicting_memblks(start, end);
> > > > >>> + /* No conflicting memory block, we can save it for later
> usage
> > > */;
> > > > >>> + if ( i < 0 )
> > > > >>> + goto save_memblk;
> > > > >>> +
> > > > >>> + if ( memblk_nodeid[i] == node ) {
> > > > >>> + /*
> > > > >>> + * Overlaps with other memblk in the same node, warning
> > > here.
> > > > >>> + * This memblk will be merged with conflicted memblk
> later.
> > > > >>> + */
> > > > >>> + printk(XENLOG_WARNING
> > > > >>> + "DT: NUMA NODE %u (%"PRIx64
> > > > >>> + "-%"PRIx64") overlaps with itself (%"PRIx64"-
> > > > >> %"PRIx64")\n",
> > > > >>> + node, start, end,
> > > > >>> + node_memblk_range[i].start,
> > > node_memblk_range[i].end);
> > > > >>> + } else {
> > > > >>> + /*
> > > > >>> + * Conflict with memblk in other node, this is an error.
> > > > >>> + * The NUMA information is invalid, NUMA will be turn
> off.
> > > > >>> + */
> > > > >>> + printk(XENLOG_ERR
> > > > >>> + "DT: NUMA NODE %u (%"PRIx64"-%"
> > > > >>> + PRIx64") overlaps with NODE %u (%"PRIx64"-
> > > > %"PRIx64")\n",
> > > > >>> + node, start, end, memblk_nodeid[i],
> > > > >>> + node_memblk_range[i].start,
> > > node_memblk_range[i].end);
> > > > >>> + bad_srat();
> > > > >>> + return -EINVAL;
> > > > >>> + }
> > > > >>> +
> > > > >>> +save_memblk:
> > > > >>> + nd = &nodes[node];
> > > > >>> + if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> > > > >>> + nd->start = start;
> > > > >>> + nd->end = end;
> > > > >>> + } else {
> > > > >>> + if ( start < nd->start )
> > > > >>> + nd->start = start;
> > > > >>> + if ( nd->end < end )
> > > > >>> + nd->end = end;
> > > > >>> + }
> > > > >>> +
> > > > >>> + printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> > > > >>> + node, start, end);
> > > > >>> +
> > > > >>> + node_memblk_range[num_node_memblks].start = start;
> > > > >>> + node_memblk_range[num_node_memblks].end = end;
> > > > >>> + memblk_nodeid[num_node_memblks] = node;
> > > > >>> + num_node_memblks++;
> > > > >>
> > > > >>
> > > > >> Is it possible to have non-contigous ranges of memory for a
> single
> > > NUMA
> > > > >> node?
> > > > >>
> > > > >> Looking at the DT bindings and Linux implementation, it seems
> > > possible.
> > > > >> Here, it seems that node_memblk_range/memblk_nodeid could handle
> it,
> > > > >> but nodes couldn't.
> > > > >
> > > > > Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI
> > > allow
> > > > > non-contiguous ranges of memory for a single NUMA node too?
> > > >
> > > > I couldn't find any restriction for ACPI. Although, I only briefly
> > > > looked at the spec.
> > > >
> > > > > If yes, I think
> > > > > this will affect x86 ACPI NUMA too. In next version, we plan to
> merge
> > > > > dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init
> into
> > > a
> > > > > neutral function. So we can fix them at the same time.
> > > > >
> > > > > If not, maybe we have to keep the diversity for dtb and ACPI here.
> > > >
> > > > I am not entirely sure what you mean. Are you saying if ACPI doesn't
> > > > allow non-contiguous ranges of memory, then we should keep the
> > > > implementation separated?
> > > >
> > > > If so, then I disagree with that. It is fine to have code that
> supports
> > > > more than what a firmware table supports. The main benefit is less
> code
> > > > and therefore less long term maintenance (with the current solution
> we
> > > > would need to check both the ACPI and DT implementation if there is
> a
> > > > bug in one).
> > > >
> > >
> > > Yes, I agree.
> > >
> >
> > I am looking for some methods to address this comment. Current "nodes"
> > has not considered the situation of memory addresses of different NUMA
> > nodes can be interleaved.
> >
> > This code exists in x86 NUMA implementation. I think it may be based on
> > one early version of Linux x86 NUMA implementation. In recent Linux
> > code, both ACPI/numa/srat.c[1] and x86 NUMA code[2] are not using
> > "nodes" to record NUMA memory address boundary. They don't depend
> > on "nodes" to do sanity check.
> >
> > To fix it, we'd better to upgrade the x86 NUMA driver. It will make
> > a great affect for Xen-x86. And I think it might out of this series
> > scope. Can we create another thread to discuss about it?
> >
> > Or could you give me suggestions that we can use some simple ways
> > to fix it?
>
> It looks like that we would have to replace all the node->start /
> node->end checks with node_memblk_range checks. There are a few of them
> in valid_numa_range, conflicting_memblks, cutoff_node,
> nodes_cover_memory. It wouldn't be trivial.
>
> Although I do think that non-contiguous memory for NUMA nodes is
> important to support, the patch series is already 40 patches. I don't
> think it is a good idea to add other significant changes to it. I
> wouldn't upgrade the x86 NUMA driver now. If we can't find a better way,
> we can proceed as you are doing in this version, with the known gap that
> we can't deal with non-contigious memory for NUMA nodes, and fix it with
> a follow-up series later. In that case we would want to have an explicit
> check for non-contiguous memory for NUMA nodes in
> dtb_numa_memory_affinity_init and error out if found.
>
Yes, I think this may be a more appropriate method at present.
I would add some code to do explicit check and give warning/error.
>
> > Also, on Linux, NUMA implementations for x86 are different from Arm64
> > and RISC-V implementations.[3]
> >
> > [1]
> https://github.com/torvalds/linux/blob/master/drivers/acpi/numa/srat.c
> > [2] https://github.com/torvalds/linux/blob/master/arch/x86/mm/numa.c
> > [3]
> https://github.com/torvalds/linux/blob/master/drivers/base/arch_numa.c
>
>
> In general, I like the idea of sharing code as much as possible between
> architectures (x86, ARM, etc.) and between DT/ACPI because it makes the
> code maintainance easier and one might even gain certain features for
> free.
>
> However, the excercise of sharing code shouldn't take significant
> additional efforts. In fact, it should decrease the overall effort:
> instead of writing new code one just take existing code and move it to
> common.
>
> In this instance, I think it would be good to be able to share the NUMA
> initialization code between x86/ARM and ACPI/DT if it doesn't cause
> extra efforts.
>
> Here the extra effort that my previous comment might cause doesn't
> derive from x86/ARM or DT/ACPI code sharing. It derives from the fact
> that our existing code doesn't deal with non-contigous memory for NUMA
> nodes unfortunately. That is something we need to find a way to cope
> with anyway.
Yes. I posted above links didn't mean to create diversity for Arm/x86.
Though, I'm not sure exactly why Linux does this. But I think for Xen,
we still can try to share code for Arm/x86 and DT/ACPI.
Cheers,
Wei Chen
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |