|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/NUMA: improve memnode_shift calculation for multi node system
On 30.09.2022 10:25, Roger Pau Monné wrote:
> On Tue, Sep 27, 2022 at 06:20:35PM +0200, Jan Beulich wrote:
>> SRAT may describe individual nodes using multiple ranges. When they're
>> adjacent (with or without a gap in between), only the start of the first
>> such range actually needs accounting for. Furthermore the very first
>> range doesn't need considering of its start address at all, as it's fine
>> to associate all lower addresses (with no memory) with that same node.
>> For this to work, the array of ranges needs to be sorted by address -
>> adjust logic accordingly in acpi_numa_memory_affinity_init().
>
> Speaking for myself (due to the lack of knowledge of the NUMA stuff) I
> would benefit from a bit of context on why and how memnode_shift is
> used.
It's used solely to shift PDXes right before indexing memnodemap[].
Hence a larger shift allows for a smaller array (less memory and,
perhaps more importantly, less cache footprint). Personally I don't
think such needs mentioning in a patch, but I know others think
differently (George for example looks to believe that the present
situation should always be described). In the case here a simple
grep for memnode_shift would tell you, and even if I described this
here it wouldn't really help review I think: You'd still want to
verify then that what I claim actually matches reality.
>> @@ -413,14 +414,37 @@ acpi_numa_memory_affinity_init(const str
>> node, pxm, start, end - 1,
>> ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
>>
>> - node_memblk_range[num_node_memblks].start = start;
>> - node_memblk_range[num_node_memblks].end = end;
>> - memblk_nodeid[num_node_memblks] = node;
>> + /* Keep node_memblk_range[] sorted by address. */
>> + for (i = 0; i < num_node_memblks; ++i)
>> + if (node_memblk_range[i].start > start ||
>> + (node_memblk_range[i].start == start &&
>
> Maybe I'm confused, but won't .start == start means we have
> overlapping ranges?
Yes, except when a range is empty.
>> + node_memblk_range[i].end > end))
>> + break;
>> +
>> + memmove(&node_memblk_range[i + 1], &node_memblk_range[i],
>> + (num_node_memblks - i) * sizeof(*node_memblk_range));
>> + node_memblk_range[i].start = start;
>> + node_memblk_range[i].end = end;
>> +
>> + memmove(&memblk_nodeid[i + 1], &memblk_nodeid[i],
>> + (num_node_memblks - i) * sizeof(*memblk_nodeid));
>> + memblk_nodeid[i] = node;
>> +
>> if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>> - __set_bit(num_node_memblks, memblk_hotplug);
>> + next = true;
>> if (end > mem_hotplug)
>> mem_hotplug = end;
>> }
>> + for (; i <= num_node_memblks; ++i) {
>> + bool prev = next;
>> +
>> + next = test_bit(i, memblk_hotplug);
>> + if (prev)
>> + __set_bit(i, memblk_hotplug);
>> + else
>> + __clear_bit(i, memblk_hotplug);
>
> Nit: I think you could avoid doing the clear for the last bit, ie:
> else if (i != num_node_memblks) __clear_bit(...);
>
> But I'm not sure it's worth adding the logic, just makes it more
> complicated to follow.
Indeed. I did consider both this and using a single __change_bit()
wrapped in a suitable if(). Both looked to me like they would hamper
proving the code is doing what it ought to do.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |