[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Wei Chen <Wei.Chen@xxxxxxx>
  • Date: Thu, 9 Sep 2021 03:54:08 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=T8oybjAyomrzS4Eifsd/X85wSsOcjifh+5Oma4Ki+yg=; b=EyJj2WXnfamE+zzN1x+2gu52UMWo7BiPGHoquAlg0wc4mquH9Ph98svaGtMb7tsVwjEi8OmXIe+Ju7h2GaJiBCZXQ84Mdoahin+6pmr+Djzbm7WCbZJbcXSWBiAl0Zq+Se5E3/jP9lU22nmMWK/CbobtH153f4QFxDRuc2iw7sK0lcxc9KYzV/ne8aby+CLO/PXtS7Bk+1m+8+HwvhKQ3jqxe1rFfL6EV+Io3GKiyfhUvM2rCcvE3lLK4jIJSRxe0EFOb+jIfpuRqFmwMy/G7TPwxv8qF19lGrw5mQ22MW92188LKTGanTZ61ukMlZ8rkXgkIZ5+xY1h2yWuri42fA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CkmHvKqwZfh6bMkI8v44beoyuR9zEvzWbumigAVSi9VKSZ2KBU7NCJ/K/EZYqsptDi3oJCxAmlZyhcF3vLICh0NejGCfoVsKGUNtHwAsVWHLwOVbBqIdj6D8lYF9AWWszxukYXLu8H5Ky8gk8Dni3roiXYY/msAbeJXZ3hAf+bKHGtPx2mNDUzJGBM1b3gFieSBDdxrpKRA3NN4GN2Pm6GmhUh24Enr6RCthQh9ksIWnxaELGDUZcAQGb///F2JyldOgb6UWWjTWDfe41ycgLr62HDqx+72ydsgxQ0arwIA077bHmr7cIhDxHKw93uK9WjMMqL/f6v9P5lonsWOPvQ==
  • Authentication-results-original: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=arm.com;
  • Cc: Julien Grall <julien@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Thu, 09 Sep 2021 03:54:37 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHXjps6uVXfLZKD7EKSvvhEptOzS6uINJ4AgAAs9PCAAHGGgIAAOFuQgBDWjrCAAQNtgIAAS/rA
  • Thread-topic: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> Sent: 2021年9月9日 6:32
> To: Wei Chen <Wei.Chen@xxxxxxx>
> Cc: Julien Grall <julien@xxxxxxx>; Stefano Stabellini
> <sstabellini@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis
> <Bertrand.Marquis@xxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>
> Subject: RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> device tree memory node
> 
> On Wed, 8 Sep 2021, Wei Chen wrote:
> > > > >>> @@ -55,6 +57,79 @@ static int __init
> > > > >> dtb_numa_processor_affinity_init(nodeid_t node)
> > > > >>>       return 0;
> > > > >>>   }
> > > > >>>
> > > > >>> +/* Callback for parsing of the memory regions affinity */
> > > > >>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> > > > >>> +                                paddr_t start, paddr_t size)
> > > > >>> +{
> > > > >>> +    struct node *nd;
> > > > >>> +    paddr_t end;
> > > > >>> +    int i;
> > > > >>> +
> > > > >>> +    if ( srat_disabled() )
> > > > >>> +        return -EINVAL;
> > > > >>> +
> > > > >>> +    end = start + size;
> > > > >>> +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
> > > > >>> +    {
> > > > >>> +        dprintk(XENLOG_WARNING,
> > > > >>> +                "Too many numa entry, try bigger
> NR_NODE_MEMBLKS
> > > \n");
> > > > >>> +        bad_srat();
> > > > >>> +        return -EINVAL;
> > > > >>> +    }
> > > > >>> +
> > > > >>> +    /* It is fine to add this area to the nodes data it will be
> > > used
> > > > >> later */
> > > > >>> +    i = conflicting_memblks(start, end);
> > > > >>> +    /* No conflicting memory block, we can save it for later
> usage
> > > */;
> > > > >>> +    if ( i < 0 )
> > > > >>> +        goto save_memblk;
> > > > >>> +
> > > > >>> +    if ( memblk_nodeid[i] == node ) {
> > > > >>> +        /*
> > > > >>> +         * Overlaps with other memblk in the same node, warning
> > > here.
> > > > >>> +         * This memblk will be merged with conflicted memblk
> later.
> > > > >>> +         */
> > > > >>> +        printk(XENLOG_WARNING
> > > > >>> +               "DT: NUMA NODE %u (%"PRIx64
> > > > >>> +               "-%"PRIx64") overlaps with itself (%"PRIx64"-
> > > > >> %"PRIx64")\n",
> > > > >>> +               node, start, end,
> > > > >>> +               node_memblk_range[i].start,
> > > node_memblk_range[i].end);
> > > > >>> +    } else {
> > > > >>> +        /*
> > > > >>> +         * Conflict with memblk in other node, this is an error.
> > > > >>> +         * The NUMA information is invalid, NUMA will be turn
> off.
> > > > >>> +         */
> > > > >>> +        printk(XENLOG_ERR
> > > > >>> +               "DT: NUMA NODE %u (%"PRIx64"-%"
> > > > >>> +               PRIx64") overlaps with NODE %u (%"PRIx64"-
> > > > %"PRIx64")\n",
> > > > >>> +               node, start, end, memblk_nodeid[i],
> > > > >>> +               node_memblk_range[i].start,
> > > node_memblk_range[i].end);
> > > > >>> +        bad_srat();
> > > > >>> +        return -EINVAL;
> > > > >>> +    }
> > > > >>> +
> > > > >>> +save_memblk:
> > > > >>> +    nd = &nodes[node];
> > > > >>> +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> > > > >>> +        nd->start = start;
> > > > >>> +        nd->end = end;
> > > > >>> +    } else {
> > > > >>> +        if ( start < nd->start )
> > > > >>> +            nd->start = start;
> > > > >>> +        if ( nd->end < end )
> > > > >>> +            nd->end = end;
> > > > >>> +    }
> > > > >>> +
> > > > >>> +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> > > > >>> +           node, start, end);
> > > > >>> +
> > > > >>> +    node_memblk_range[num_node_memblks].start = start;
> > > > >>> +    node_memblk_range[num_node_memblks].end = end;
> > > > >>> +    memblk_nodeid[num_node_memblks] = node;
> > > > >>> +    num_node_memblks++;
> > > > >>
> > > > >>
> > > > >> Is it possible to have non-contigous ranges of memory for a
> single
> > > NUMA
> > > > >> node?
> > > > >>
> > > > >> Looking at the DT bindings and Linux implementation, it seems
> > > possible.
> > > > >> Here, it seems that node_memblk_range/memblk_nodeid could handle
> it,
> > > > >> but nodes couldn't.
> > > > >
> > > > > Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI
> > > allow
> > > > > non-contiguous ranges of memory for a single NUMA node too?
> > > >
> > > > I couldn't find any restriction for ACPI. Although, I only briefly
> > > > looked at the spec.
> > > >
> > > > > If yes, I think
> > > > > this will affect x86 ACPI NUMA too. In next version, we plan to
> merge
> > > > > dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init
> into
> > > a
> > > > > neutral function. So we can fix them at the same time.
> > > > >
> > > > > If not, maybe we have to keep the diversity for dtb and ACPI here.
> > > >
> > > > I am not entirely sure what you mean. Are you saying if ACPI doesn't
> > > > allow non-contiguous ranges of memory, then we should keep the
> > > > implementation separated?
> > > >
> > > > If so, then I disagree with that. It is fine to have code that
> supports
> > > > more than what a firmware table supports. The main benefit is less
> code
> > > > and therefore less long term maintenance (with the current solution
> we
> > > > would need to check both the ACPI and DT implementation if there is
> a
> > > > bug in one).
> > > >
> > >
> > > Yes, I agree.
> > >
> >
> > I am looking for some methods to address this comment. Current "nodes"
> > has not considered the situation of memory addresses of different NUMA
> > nodes can be interleaved.
> >
> > This code exists in x86 NUMA implementation. I think it may be based on
> > one early version of Linux x86 NUMA implementation. In recent Linux
> > code, both ACPI/numa/srat.c[1] and x86 NUMA code[2] are not using
> > "nodes" to record NUMA memory address boundary. They don't depend
> > on "nodes" to do sanity check.
> >
> > To fix it, we'd better to upgrade the x86 NUMA driver. It will make
> > a great affect for Xen-x86. And I think it might out of this series
> > scope. Can we create another thread to discuss about it?
> >
> > Or could you give me suggestions that we can use some simple ways
> > to fix it?
> 
> It looks like that we would have to replace all the node->start /
> node->end checks with node_memblk_range checks. There are a few of them
> in valid_numa_range, conflicting_memblks, cutoff_node,
> nodes_cover_memory. It wouldn't be trivial.
> 
> Although I do think that non-contiguous memory for NUMA nodes is
> important to support, the patch series is already 40 patches. I don't
> think it is a good idea to add other significant changes to it. I
> wouldn't upgrade the x86 NUMA driver now. If we can't find a better way,
> we can proceed as you are doing in this version, with the known gap that
> we can't deal with non-contigious memory for NUMA nodes, and fix it with
> a follow-up series later. In that case we would want to have an explicit
> check for non-contiguous memory for NUMA nodes in
> dtb_numa_memory_affinity_init and error out if found.
> 

Yes, I think this may be a more appropriate method at present.
I would add some code to do explicit check and give warning/error.
 
> 
> > Also, on Linux, NUMA implementations for x86 are different from Arm64
> > and RISC-V implementations.[3]
> >
> > [1]
> https://github.com/torvalds/linux/blob/master/drivers/acpi/numa/srat.c
> > [2] https://github.com/torvalds/linux/blob/master/arch/x86/mm/numa.c
> > [3]
> https://github.com/torvalds/linux/blob/master/drivers/base/arch_numa.c
> 
> 
> In general, I like the idea of sharing code as much as possible between
> architectures (x86, ARM, etc.) and between DT/ACPI because it makes the
> code maintainance easier and one might even gain certain features for
> free.
> 
> However, the excercise of sharing code shouldn't take significant
> additional efforts. In fact, it should decrease the overall effort:
> instead of writing new code one just take existing code and move it to
> common.
> 
> In this instance, I think it would be good to be able to share the NUMA
> initialization code between x86/ARM and ACPI/DT if it doesn't cause
> extra efforts.
> 
> Here the extra effort that my previous comment might cause doesn't
> derive from x86/ARM or DT/ACPI code sharing. It derives from the fact
> that our existing code doesn't deal with non-contigous memory for NUMA
> nodes unfortunately. That is something we need to find a way to cope
> with anyway.

Yes. I posted above links didn't mean to create diversity for Arm/x86.
Though, I'm not sure exactly why Linux does this. But I think for Xen,
we still can try to share code for Arm/x86 and DT/ACPI.

Cheers,
Wei Chen


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.