[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node


  • To: Wei Chen <Wei.Chen@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Wei Chen <Wei.Chen@xxxxxxx>
  • Date: Wed, 8 Sep 2021 07:34:04 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=3Z+hSZ9cAhQo5fyXkwP2E3uRS7GHreh4U+7Gx87EBjs=; b=HKsK2fVoX30nujNdv6rfMgMAZpRd7mwaoClPT1cXVVms3N1B9CE5jRjZ/fuVVdWwlyGjaMu9S4wjOB78Uzu6M/PLA825nOFGtPf7jxVa51HfbE/N8Nlq7f906lqmmMOBBEFOYWEto1jH4KkoHhbq4L1J+dwO10qO1sHdA25knqo77WR/+4pBahzfcsvN8/cfUqQDz74JC9jRcSkf1JoKaU+xqL++h51nMFZdPRBHCe6Bjam9OGqu2ftxGIWtWkD3UYAAhF9P67Vh867xUzjwm9ImAjnRs1eIlLAtCeckMYXGxcRq7g5F9OxbK5DG/dZSCCPQ5NZtKvIV1fEXW1m0OQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KSmxRL6x+q0cZxShWhZZ3qnp+XpEh6YPSINKOBd+6Szccvfs3HWHffP+h/z0QV1D1HBUHhM7P50a1SJOhQ3UdOxqaYk9urOZu3pqH9WSkWcb/bUgTnyoQtNQ/IBEyCiq8F/9NwUwIuf3zZ+EwoJJ+7u3SY2JEydo5YsElYUNhxSdZio2XGMjRTYbIYEsPZiI6TGRXmulxeMjgiiOwY9FUHwFmxEExaT7dpXSRlmYu6fqpnJIliqYeaRDer+A/Q5HWTSlWSRo1jpIpjMjk41BrO0CEeuy+kZtCGBTMvaQeBjGU2yuPqOGnLs8LWcHrQBEJ3dKLl8yR8XjjCRLuJNVdw==
  • Authentication-results-original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Wed, 08 Sep 2021 07:34:45 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHXjps6uVXfLZKD7EKSvvhEptOzS6uINJ4AgAAs9PCAAHGGgIAAOFuQgBDWjrA=
  • Thread-topic: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node

Hi Julien, Stefano, Jan

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Wei
> Chen
> Sent: 2021年8月28日 21:58
> To: Julien Grall <julien@xxxxxxx>; Stefano Stabellini
> <sstabellini@xxxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis
> <Bertrand.Marquis@xxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>
> Subject: RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> device tree memory node
> 
> Hi Julien,
> 
> > -----Original Message-----
> > From: Julien Grall <julien@xxxxxxx>
> > Sent: 2021年8月28日 18:34
> > To: Wei Chen <Wei.Chen@xxxxxxx>; Stefano Stabellini
> > <sstabellini@xxxxxxxxxx>
> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis
> > <Bertrand.Marquis@xxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>
> > Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> > device tree memory node
> >
> > Hi Wei,
> >
> > On 28/08/2021 04:56, Wei Chen wrote:
> > >> -----Original Message-----
> > >> From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > >> Sent: 2021��8��28�� 9:06
> > >> To: Wei Chen <Wei.Chen@xxxxxxx>
> > >> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; sstabellini@xxxxxxxxxx;
> > julien@xxxxxxx;
> > >> jbeulich@xxxxxxxx; Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
> > >> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to
> parse
> > >> device tree memory node
> > >>
> > >> On Wed, 11 Aug 2021, Wei Chen wrote:
> > >>> Memory blocks' NUMA ID information is stored in device tree's
> > >>> memory nodes as "numa-node-id". We need a new helper to parse
> > >>> and verify this ID from memory nodes.
> > >>>
> > >>> In order to support memory affinity in later use, the valid
> > >>> memory ranges and NUMA ID will be saved to tables.
> > >>>
> > >>> Signed-off-by: Wei Chen <wei.chen@xxxxxxx>
> > >>> ---
> > >>>   xen/arch/arm/numa_device_tree.c | 130
> > ++++++++++++++++++++++++++++++++
> > >>>   1 file changed, 130 insertions(+)
> > >>>
> > >>> diff --git a/xen/arch/arm/numa_device_tree.c
> > >> b/xen/arch/arm/numa_device_tree.c
> > >>> index 37cc56acf3..bbe081dcd1 100644
> > >>> --- a/xen/arch/arm/numa_device_tree.c
> > >>> +++ b/xen/arch/arm/numa_device_tree.c
> > >>> @@ -20,11 +20,13 @@
> > >>>   #include <xen/init.h>
> > >>>   #include <xen/nodemask.h>
> > >>>   #include <xen/numa.h>
> > >>> +#include <xen/libfdt/libfdt.h>
> > >>>   #include <xen/device_tree.h>
> > >>>   #include <asm/setup.h>
> > >>>
> > >>>   s8 device_tree_numa = 0;
> > >>>   static nodemask_t processor_nodes_parsed __initdata;
> > >>> +static nodemask_t memory_nodes_parsed __initdata;
> > >>>
> > >>>   static int srat_disabled(void)
> > >>>   {
> > >>> @@ -55,6 +57,79 @@ static int __init
> > >> dtb_numa_processor_affinity_init(nodeid_t node)
> > >>>       return 0;
> > >>>   }
> > >>>
> > >>> +/* Callback for parsing of the memory regions affinity */
> > >>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> > >>> +                                paddr_t start, paddr_t size)
> > >>> +{
> > >>> +    struct node *nd;
> > >>> +    paddr_t end;
> > >>> +    int i;
> > >>> +
> > >>> +    if ( srat_disabled() )
> > >>> +        return -EINVAL;
> > >>> +
> > >>> +    end = start + size;
> > >>> +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
> > >>> +    {
> > >>> +        dprintk(XENLOG_WARNING,
> > >>> +                "Too many numa entry, try bigger NR_NODE_MEMBLKS
> \n");
> > >>> +        bad_srat();
> > >>> +        return -EINVAL;
> > >>> +    }
> > >>> +
> > >>> +    /* It is fine to add this area to the nodes data it will be
> used
> > >> later */
> > >>> +    i = conflicting_memblks(start, end);
> > >>> +    /* No conflicting memory block, we can save it for later usage
> */;
> > >>> +    if ( i < 0 )
> > >>> +        goto save_memblk;
> > >>> +
> > >>> +    if ( memblk_nodeid[i] == node ) {
> > >>> +        /*
> > >>> +         * Overlaps with other memblk in the same node, warning
> here.
> > >>> +         * This memblk will be merged with conflicted memblk later.
> > >>> +         */
> > >>> +        printk(XENLOG_WARNING
> > >>> +               "DT: NUMA NODE %u (%"PRIx64
> > >>> +               "-%"PRIx64") overlaps with itself (%"PRIx64"-
> > >> %"PRIx64")\n",
> > >>> +               node, start, end,
> > >>> +               node_memblk_range[i].start,
> node_memblk_range[i].end);
> > >>> +    } else {
> > >>> +        /*
> > >>> +         * Conflict with memblk in other node, this is an error.
> > >>> +         * The NUMA information is invalid, NUMA will be turn off.
> > >>> +         */
> > >>> +        printk(XENLOG_ERR
> > >>> +               "DT: NUMA NODE %u (%"PRIx64"-%"
> > >>> +               PRIx64") overlaps with NODE %u (%"PRIx64"-
> > %"PRIx64")\n",
> > >>> +               node, start, end, memblk_nodeid[i],
> > >>> +               node_memblk_range[i].start,
> node_memblk_range[i].end);
> > >>> +        bad_srat();
> > >>> +        return -EINVAL;
> > >>> +    }
> > >>> +
> > >>> +save_memblk:
> > >>> +    nd = &nodes[node];
> > >>> +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> > >>> +        nd->start = start;
> > >>> +        nd->end = end;
> > >>> +    } else {
> > >>> +        if ( start < nd->start )
> > >>> +            nd->start = start;
> > >>> +        if ( nd->end < end )
> > >>> +            nd->end = end;
> > >>> +    }
> > >>> +
> > >>> +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> > >>> +           node, start, end);
> > >>> +
> > >>> +    node_memblk_range[num_node_memblks].start = start;
> > >>> +    node_memblk_range[num_node_memblks].end = end;
> > >>> +    memblk_nodeid[num_node_memblks] = node;
> > >>> +    num_node_memblks++;
> > >>
> > >>
> > >> Is it possible to have non-contigous ranges of memory for a single
> NUMA
> > >> node?
> > >>
> > >> Looking at the DT bindings and Linux implementation, it seems
> possible.
> > >> Here, it seems that node_memblk_range/memblk_nodeid could handle it,
> > >> but nodes couldn't.
> > >
> > > Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI
> allow
> > > non-contiguous ranges of memory for a single NUMA node too?
> >
> > I couldn't find any restriction for ACPI. Although, I only briefly
> > looked at the spec.
> >
> > > If yes, I think
> > > this will affect x86 ACPI NUMA too. In next version, we plan to merge
> > > dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init into
> a
> > > neutral function. So we can fix them at the same time.
> > >
> > > If not, maybe we have to keep the diversity for dtb and ACPI here.
> >
> > I am not entirely sure what you mean. Are you saying if ACPI doesn't
> > allow non-contiguous ranges of memory, then we should keep the
> > implementation separated?
> >
> > If so, then I disagree with that. It is fine to have code that supports
> > more than what a firmware table supports. The main benefit is less code
> > and therefore less long term maintenance (with the current solution we
> > would need to check both the ACPI and DT implementation if there is a
> > bug in one).
> >
> 
> Yes, I agree.
> 

I am looking for some methods to address this comment. Current "nodes"
has not considered the situation of memory addresses of different NUMA
nodes can be interleaved.

This code exists in x86 NUMA implementation. I think it may be based on
one early version of Linux x86 NUMA implementation. In recent Linux
code, both ACPI/numa/srat.c[1] and x86 NUMA code[2] are not using
"nodes" to record NUMA memory address boundary. They don't depend
on "nodes" to do sanity check.

To fix it, we'd better to upgrade the x86 NUMA driver. It will make
a great affect for Xen-x86. And I think it might out of this series
scope. Can we create another thread to discuss about it?

Or could you give me suggestions that we can use some simple ways
to fix it?

Also, on Linux, NUMA implementations for x86 are different from Arm64
and RISC-V implementations.[3]

[1] https://github.com/torvalds/linux/blob/master/drivers/acpi/numa/srat.c
[2] https://github.com/torvalds/linux/blob/master/arch/x86/mm/numa.c
[3] https://github.com/torvalds/linux/blob/master/drivers/base/arch_numa.c

> > Cheers,
> >
> > --
> > Julien Grall

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.