Hi Aron,
>> 1.) Guest NUMA support: spread a guest's resources (CPUs and memory)
>> over several nodes and propagate the appropriate topology to the
>> guest. ...
>It seems like you are proposing two things at once here. Let's call
>these 1a and 1b
>1a. Expose NUMA topology to the guests. This isn't the topology of
> dom0, just the topology of the domU, i.e. it is constructed by
> dom0 when starting the domain.
>1b. Spread the guest over nodes. I can't tell if you mean to do this
> automatically or by request when starting the guest. This seems
> to be separate from 1a.
From an implementation point-of-view this is right, if you look at my
patches I sent mid of August those parts are done in seperate patches:
http://lists.xensource.com/archives/html/xen-devel/2007-08/msg00275.html
Patch 3/4 cares about 1b), Patch 4/4 is about 1a)
But both parts do not make much sense if done seperately. If you spread
the guest over several nodes and don't tell the guest OS about it, you
will have about the same behaviour Xen had before the integration of the
basic NUMA patches from Ryan Harper in October 2006.
>> ***Disadvantages***:
>> - The guest has to support NUMA...
>> - The guest's workload has to fit NUMA...
>IMHO the list of disadvantages is only what we have in xen today.
>Presently no guests can see the NUMA topology, so it's the same as if
>they don't have support in the guest. Adding NUMA topology
>propogation does not create these disadvantages, it simply exposes the
>weakness of the lesser operating systems.
This was mostly thought of disadvantages against the solution 2)
>> 2.) Dynamic load balancing and page migration:
>Again, this seems like a two-part proposal.
>2a. Add to xen the ability to run a guest within a node, so that cpus
> and ram are allocated from within the node instead of randomly
> across the system.
This is already in Xen, at least if you pin the guest manually to a
certain node _before_ creating the guest (by saying for instance
cpus=0,1 if the first node consists of the first two CPUs). Xen will try
to allocate the guest's memory from within the node the first VCPU is
currently scheduled on (at least for HVM guests).
>2b. NUMA balancing. While this seems like a worthwhile goal, IMHO
> it's separate from the first part of the proposal.
This is most of the work that has to be done.
> If the mechanics of migrating between NUMA nodes is implemented in the
> hypervisor, then policy and control can be implemented in dom0
> userland, so none of the automatic part of this needs to be in the
> hypervisor.
This maybe true, at least there should be some means to manually migrate
domains between nodes, which must be triggered from Dom0. So automatic
behavior could be triggered from there, too.
Andre.
--
Andre Przywara
AMD - Operating System Research Center, Dresden, Germany
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|