WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH] numa: fix problems with memory-less nodes

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH] numa: fix problems with memory-less nodes
From: Andre Przywara <andre.przywara@xxxxxxx>
Date: Wed, 13 Jan 2010 10:42:26 +0100
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 13 Jan 2010 01:43:23 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C773342E.632A%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C773342E.632A%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.21 (X11/20090329)
Keir Fraser wrote:
On 12/01/2010 16:30, "Andre Przywara" <andre.przywara@xxxxxxx> wrote:

If we decided to not report memory-less nodes in physinfo we should also
skip them in the node_to_{cpu,memory,dma32_mem} Python lists. Currently
Xen will not start guests on machines with memory-less nodes which are
not the last ones. On an 8-node machine with empty nodes 4 and 5 "xm
info" was reporting wrongly, also the node assignment algorithm crashed
with a division by zero error.
The attached patch fixes this by skipping empty nodes in the enumeration
of resources.

Where to begin? Firstly, I thought that the ordering of nodes in the
node_to_* lists actually mattered -- the lists are indexed by nodeid (a
handle which can be passed to other Xen interfaces) are they not? If you
don't include empty entries, then the index position of entries is no longer
meaningful.
OK, that seems to be an issue.
To be honest I am not a fan of omitting nodes from physinfo, but that is what the current code (RC1!) does and it definitely breaks Xen on my box. So I just made this small patch to make it work again. Actually I would opt to revert the patch cropping the number of nodes reported by physinfo (20762:a1d0a575b4ba ?). Yes, that would result in nodes reported with zero memory, but in my tests this did not raise problems, as a node's memory can (and will) be exhausted even during normal operation.
To illustrate the problem:
My box has 8 nodes, I removed the memory from nodes 4 & 5.
With the unpatched version xm info says:
total_memory           : 73712
free_memory            : 70865
node_to_cpu            : node0:0-5,24-35
                         node1:6-11
                         node2:12-17
                         node3:18-23
                         node4:no cpus
                         node5:no cpus
node_to_memory         : node0:14267
                         node1:8167
                         node2:16335
                         node3:8167
                         node4:0
                         node5:0
So this listing completely omits the last two nodes (CPUs 36-47 and the 24 GB connected to them). The debug key triggered Xen-internal listing is correct, though:
(XEN) idx0 -> NODE0 start->0 size->4423680
(XEN) phys_to_nid(0000000000001000) -> 0 should be 0
(XEN) idx1 -> NODE1 start->4423680 size->2097152
(XEN) phys_to_nid(0000000438001000) -> 1 should be 1
(XEN) idx2 -> NODE2 start->6520832 size->4194304
(XEN) phys_to_nid(0000000638001000) -> 2 should be 2
(XEN) idx3 -> NODE3 start->10715136 size->2097152
(XEN) phys_to_nid(0000000a38001000) -> 3 should be 3
(XEN) idx6 -> NODE6 start->12812288 size->4194304
(XEN) phys_to_nid(0000000c38001000) -> 6 should be 6
(XEN) idx7 -> NODE7 start->17006592 size->2097152
(XEN) phys_to_nid(0000001038001000) -> 7 should be 7
With the patched xc.so xm info reports:
node_to_cpu            : node0:0-5,24-35
                         node1:6-11
                         node2:12-17
                         node3:18-23
                         node4:36-41
                         node5:42-47
node_to_memory         : node0:14267
                         node1:8167
                         node2:16335
                         node3:8167
                         node4:16335
                         node5:7590

Although memory less nodes are not very common, it could happen sometimes with our new dual-node processor, where one could (even accidentally) forget to populate certain memory slots, as it has in fact a dual-node dual-channel memory interface.

Secondly, you avoid appending to the node_to_cpu list if the node is
cpu-less. But you avoid appending to the node_to_{memory,dma32} lists only
if the node is *both* cpu-less and memory-less. That's not even consistent.
OK, that's a point. I see that the value of node_exists can change.
Please just fix the crap Python code.
What part do you exactly mean? The part triggering the division by zero?

I will see if I can fix this properly.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel