[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] xen: credit2: limit the max number of CPUs in a runqueue



On Thu, 2020-04-30 at 14:52 +0200, Jürgen Groß wrote:
> On 30.04.20 14:28, Dario Faggioli wrote:
> > That being said, I can try to make things a bit more fair, when
> > CPUs
> > come up and are added to the pool. Something around the line of
> > adding
> > them to the runqueue with the least number of CPUs in it (among the
> > suitable ones, of course).
> > 
> > With that, when the user removes 4 CPUs, we will have the 6 vs 10
> > situation. But we would make sure that, when she adds them back, we
> > will go back to 10 vs. 10, instead than, say, 6 vs 14 or something
> > like
> > that.
> > 
> > Was something like this that you had in mind? And in any case, what
> > do
> > you think about it?
> 
> Yes, this would be better already.
> 
So, a couple of thoughts. Doing something like what I tried to describe
above is not too bad, and I have it pretty much ready.

With that, on an Intel system with 96 CPUs on two sockets, and
max_cpus_per_runqueue set to 16, I got, after boot, instead of just 2
giant runqueues with 48 CPUs in each:

 - 96 CPUs online, split in 6 runqueues (3 for each socket) with 16 
   runqueues in each of them

I can also "tweak" it in such a way that, if one for instance boots
with "smt=no", we get to a point where we have:

 - 48 CPUs online, split in 6 runqueues, with 8 CPUs in each

Now, I think this is good, and actually better than the current
situation where on such a system, we only have two very big runqueues
(and let me repeat that introducing a per-LLC runqueue arrangement, on
which I'm also working, won't help in this case, as NUMA node == LLC).

So, problem is that if one starts to fiddle with cpupools and cpu on
and offlining, things can get pretty unbalanced. E.g., I can end up
with 2 runqueues on a socket, one with 16 CPUs and the other with just
a couple of them.

Now, this is all possible as of now (I mean, without this patch)
already, although at a different level. In fact, I can very well remove
all-but-1 CPUs of node 1 from Pool-0, and end up again with a runqueue
with a lot of CPUs and another with just one.

It looks like we need a way to rebalance the runqueues, which should be
doable... But despite having spent a couple of days trying to come up
with something decent, that I could include in v2 of this series, I
couldn't get it to work sensibly.

So, this looks to me like an improvement, that would need being
improved further, but I'm not sure we have the time for it right now
(Cc-ing Paul). Should we still go for what's ready? I think yes, but
I'd be interested in opinions...

Also, if anyone has any clever ideas on how to implement a mechanism
that rebalance the CPUs within the runqueue, I'm all ears and am up for
trying implementing. :-)

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.