[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes [and 1 more messages]

On 07/18/2012 01:00 PM, Ian Jackson wrote:
Ian Campbell writes ("Re: [PATCH 1 of 3 v5/leftover] libxl: enable automatic 
placement of guests on NUMA nodes [and 1 more messages]"):
On Wed, 2012-07-18 at 10:43 +0100, Dario Faggioli wrote:
What could be done is restricting automatic placement to guests that
fits on 4 or 8 nodes for 4.2.

8 would mean on a 32 node system considering 10,518,300 combinations?

4 would mean on a 32 node system considering 35,960 combinations? On a
64 node system it would mean 635,376 combinations.

If that's the case then lets go with 4 as the limit for 4.2.0.

What is the maximum number of NUMA nodes we might expect to see on a
single system in the next five years?  I would argue that 32 is too
optimistic.  128 or 256 seem like more reasonable upper bounds.

Wow, what are you talking about?
To calm this down from the AMD side:
The current Opteron NUMA architecture is limited to exactly 8 nodes. This has ever been the case since the release of Opteron and changing this is not trivial and will not happen in any near future. In general I don't think we will see much bigger NUMA systems, but more cluster like architectures.

Maybe Juergen can comment on the Fujitsu side.

So my suggestion: Get this NUMA placement in if anyhow possible for 4.2.0. We have much bigger problems without this algorithm than the theoretic gigantic machine you are talking about. Since it is an internal algorithm, we can fix this later easily in 4.2.1 and 4.3 and nobody will ever notice this problem.

Just my 2 cents.


4 of
256 is 174,792,640 combinations - ie too many.  So there needs to be a
limit on the host size too.  But if you pick an upper host size limit
of 64 then in a hypothetical 128-node system you'd fail to do the
trivial search for a 1-node guest.

An upper bound on log(number_of_combinations) is
log(number_of_nodes_on_the_host) * number_of_nodes_for_the_guest.
This fact could be used to determine more accurately whether the
algorithm is going to terminate in a reasonable time.

Also when this algorithm would be used, but would take too long, we
should print a warning which tells the user they should use cpupools
to assign nodes directly.


Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.