[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: make "dom0_nodes=" work with credit2


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Dario Faggioli <dfaggioli@xxxxxxxx>
  • Date: Tue, 12 Apr 2022 15:48:11 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zK5aaapFFOun7/nhGJMeTfIBh3GoMdwRavkUl00sQFg=; b=nyAw6bYEIxaq7pVRtHTvy6kygBO4KFjTka6vY+tATNiqzHbKRXbzWSpuaynMZHfo6GbvGz0HyCfzsg+gY2QZfDmLO/hlu0Pd7dY2ynKa92DUJeifptU4m0uAk66F8w6hmDIaEc6bB8f+bIMZjdvl+ghqZhrc260cAgEVxqBcozmaAT65jcuqNsYm0cGzqT0WIhXzvAQKt60ekN2ITtvy5tganxSLleZmExcGuh0ttOwlahub2cizlWugAowMQgxkHZOrMdGxyhiknIW07Ytsy0GLwfsf8YS1fUxY4AWLCpR2Ea/NbdB2grPqOAnn8e5R0Z4YK5VTKNPsBKvWc14MNA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ebkzoKF7/dfpruK4DK1jJq84vj0VPSRFXeEkwKX759b2QbiK6mViXwY7jowkpkTFV+q0Me/xdj5O4EeuwgZVBQQDpNcmDhbnxaM8rOkkKX/G5leaBLmGfnwrdF2qFCv0URoBiR/H6X4RTAuiL4DDNMaNR6X3O+RzQG2uSKN/Z8ubx1dQmCIskeWAhfoC12/YqsBGu6AM57KVrpHsr2WSYfqhOp3JpqFGMTIPPElYcCnbVZS1e6aVj//DwDHlHgqTnW30aYmT4XRZ12lNlv2MNKIVVfe89HqHJKmZYIAhWwT9cG4K9TezyWbJ3QDeEM2u+NtIloUc31huxXsGVW7HMg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 12 Apr 2022 15:48:31 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYSoM0M4Zi9SzqTkOjTOtQV+PP7azl4C2AgAAVUwCABn7lAA==
  • Thread-topic: [PATCH] x86: make "dom0_nodes=" work with credit2

On Fri, 2022-04-08 at 14:36 +0200, Jan Beulich wrote:
> On 08.04.2022 13:20, Dario Faggioli wrote:
> > On Thu, 2022-04-07 at 15:27 +0200, Jan Beulich wrote:
> > > Credit2 moving the vCPU-s off of their initially assigned ones
> > > right
> > > away of course renders sched_select_initial_cpu()'s use of
> > > cpu_cycle()
> > > pretty useless.
> > > 
> > Mmm... you mean that Credit2 is moving the vCPUs off they're
> > assigned
> > ones _right_now_, or that it will, with this patch?
> 
> Right now. On a 4-node (6 cores each) system with "dom0_nodes=0" I've
> observed the 6 vCPU-s to reliably be put on CPUs 13, 14, etc. The
> patch is actually undoing that questionable movement.
> 
So, yes, as said, I was wrong here, because I was forgetting that we
update dom0's node-affinity according to what we find in dom0_nodes,
while we parse dom0_nodes itself (and it was stupid of me to forget
that, because of course we do it! :-/).

And in fact, since node-affinity is already set to something different
than "all", sched_select_initial_cpu() picks up a CPU from it.

And it is also the case that we could, basically, use that later, in
sched_init_vcpu(), which is kind of what I'm proposing we do.

> Since I have a large amount of other patches in place (none of which
> I would assume to have such an effect) - Olaf has meanwhile confirmed
> that the change helps for him as well.
> 
> > If you mean the former, I'm not sure it is. In fact, when
> > sched_select_initial_cpu() is called for dom0, dom0's node affinity
> > is
> > just all nodes, isn't it? So, the we can pick any CPU in the
> > cpupool,
> > and we use cycle to try to spread the vCPUs kind of evenly.
> 
> The CPU mask used in the function is 0x3f for the example given
> above.
> I did not check which of the constituent parts of the calculation
> have
> this effect. But the result is that all CPUs will be put on CPU 0
> first, as cpumask_cycle(..., 13) for a mask of 0x3f produces 0.
> 
Right. For instance, on my 16 CPUs, 2 nodes test box, where I tried
using dom0_nodes=0 (mask {0-7}), I saw the following:
- for d0v0, sched_select_initial_cpu() selects CPU _0_
- later in sched_init_vcpu() affinity is set to all
- later^2 in sched_init_vcpu(), Credit2's implementation of 
 sched_insert_unit() does a "CPU pick", assuming it can use any CPU, 
 as the affinity is "all". It ends up picking CPU _9_

We move on in the boot process, and we get to create the other dom0
vcpus, e.g., d0v1:
- d0v0->processor is _9_, so the cpumask_cycle() in 
 sched_select_initial_cpu(), done from 9, with a mask of {0-7}, 
 brings us to _0_
- that does not matter much, as the CPU pick inside of the specific 
 scheduler has its own logic, which also involves a cpumask_cycle().
 And since we picked _9_ before, and the affinity has again been set 
 to "all", we pick _10_ and, for what we know, that is fine.

This goes on and happens for all vCPUs, with the result that (in the
worst case) all of them are placed on CPUs where they can't run.

Once can think that this comes from the fact that we do CPU selection
twice (once in sched_select_initial_cpu() and another one, inside the
specific scheduler). However, this is necessary, for two reasons:
- a scheduler may not have its own "CPU pick logic", as that is not 
  required
- even for the schedulers that do, we need to have something in
  v->processor, when it runs. And sched_select_initial_cpu() is   
  something that is (should be?) simple enough, but at the same time
  it puts something sensible there.

> > That's because, by default, affinity is just "all cpus", when we
> > create
> > the vCPUs, and we change that later, if they have one already (due
> > to
> > it being present in the config file, or in the dom0_nodes
> > parameter).
> 
> But that's what I'm talking about a little further down, where you
> reply that you don't think using the more narrow set would hide the
> issue.
> 
Sure, as said, I was missing a piece when replied.

> > This should make sched_select_initial_cpu() pick one of the
> > "correct"
> > CPUs in the first place. But I don't know if it's worth, neither if
> > we'll still need this patch anyway (I have to check more
> > thoroughly).
> 
> As per above - sched_select_initial_cpu() behaves as I would expect
> it. It's credit2 which subsequently overrides that decision.
> 
Right, and it overrides it, because we set the affinity to "all", so
it's in its own right to do so. And the point here is that it is indeed
not right to do that.

So, as you said yourself, we should really change the fact that we're
setting "all" as an affinity for dom0 vCPUs, there in
sched_init_vcpu(), while we know it's not the case.

And while doing that, I think we should consolidate touching the
affinity only there, avoiding altering it twice. After all, we already
know how it should look like, so let's go for it.

I'll send a patch to that effect, to show what I mean with this. And...

> > >  But I guess that's still useful for other schedulers.
>
...Yes, I'll rebase your patch on top of mine. In fact, although
tecnically no longer necessary, for _this_specific_ issue (I tested
without it, and things work), I think it still makes sense.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.