[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: make "dom0_nodes=" work with credit2


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Dario Faggioli <dfaggioli@xxxxxxxx>
  • Date: Fri, 29 Apr 2022 10:52:14 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EJ7dwpD3/yQ/oMxzERgbv1XefYeW7lPcwO5gGHqHWK8=; b=TgWydtpD2PDMyk4Z/m24l54/2dU/m7fBA8a6xKyyF14QCkWzsLfHHs0FXc2aNkD3wX3JzpMlVjw3TUvJaR2Pr4w0w3LR/Yfk1WCZ4hFpCyMvqVpWXj0oRBhnX/xkUHb5SAjv2Z9NR55E9nr5UuTPh+IaJwsjarkn8j/n6tpxJ4T6AA86EIk/HXPASPh1YcMRDFLBi92RYUou66w8NsCXr3uMvvPGTcJ4jHICp5DAomAWfmmA3LRwE3iiha6yosf+TQqS6wiDKZFPtshgnNq5wcnKiXaw0TOEvmbZk2A6CdK4G73SZARTykczuIyvmX/VGfTt6oqJnqf5Pl3lVLeFtg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=O25d6xmlJpDCQyrLLhWn+2NHC70wNqoTC8TE9tgUVF/3iqYxf2ehFbsdqzFFIdoEQV0ceewsLl5YaimV72bsi1DIicdueHkq46b8GjHuvyFOQg0JV6GpKHLMHlUJhIEhbU+AJLFJy93F3CF6ueCs55Wc9FXhzgz7783WHdUUcNGfXDm64n7auHGLeawYyBixVwRSWkDLuzY1autuP69byD7xo9uA54oBZp4LB4cLb54Bmjx4NEY2PpjYSFRP7MI9LBT5ESJC6Dlt7Vd6wBjsc+pVboaFvhD99kl4pA8ywwb7Pf2uab5O+mliKgfzKpWRiZTZ2WHpvVpryv/6Ry/rTg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 29 Apr 2022 10:52:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYSoM0M4Zi9SzqTkOjTOtQV+PP7azl4C2AgAAVUwCABn7lAIAABpgAgAEqhACAGTPWgA==
  • Thread-topic: [PATCH] x86: make "dom0_nodes=" work with credit2

On Wed, 2022-04-13 at 12:00 +0200, Jan Beulich wrote:
> On 12.04.2022 18:11, Dario Faggioli wrote:
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -572,11 +572,41 @@ int sched_init_vcpu(struct vcpu *v)
> >      }
> >  
> >      /*
> > -     * Initialize affinity settings. The idler, and potentially
> > -     * domain-0 VCPUs, are pinned onto their respective physical
> > CPUs.
> > +     * Initialize affinity settings. By doing this before the unit
> > is
> > +     * inserted in the scheduler runqueues (by the call to
> > sched_insert_unit(),
> > +     * at the end of the function, we are sure that it will be put
> > on an
> > +     * appropriate CPU.
> >       */
> > -    if ( is_idle_domain(d) || (is_hardware_domain(d) &&
> > opt_dom0_vcpus_pin) )
> > +    if ( pv_shim && v->vcpu_id == 0 )
> 
> I don't think you can handle the shim case first, as then you'd also
> have
> its CPU0 idle vCPU take this path. The difference _may_ only be
> cosmetic,
> but I think it would be odd for CPU0's idle vCPU to have a soft
> affinity
> of just CPU0, while all others use cpumask_all.
> 
Ok, yes, I didn't think to that. I'll reshuffle the if-s order.

> 
> > +    {
> > +        /*
> > +         * The idler, and potentially domain-0 VCPUs, are pinned
> > onto their
> > +         * respective physical CPUs.
> > +         */
> >          sched_set_affinity(unit, cpumask_of(processor),
> > &cpumask_all);
> > +    }
> > +    else if ( is_hardware_domain(d) )
> 
> ... here I wonder: Shouldn't this be limited to Dom0 (for the
> purposes here
> != hwdom)? Any special affinity for a late hwdom ought to be
> specified by
> the logic creating that domain imo, not by command line options
> concerning
> Dom0 only.
> 
I think this makes sense. I'll add a patch for changing it.

> I also have a more general question here: sched.h says "Bitmask of
> CPUs
> on which this VCPU may run" for hard affinity and "Bitmask of CPUs on
> which this VCPU prefers to run" for soft affinity. Additionally
> there's
> soft_aff_effective. Does it make sense in the first place for one to
> be
> a proper subset of the of the other in _both_ directions? 
>
I'm not sure I'm 100% getting what you're asking. In particular, I'm
not sure what you mean with "for one to be a propper subset of the
other in both directions"?

Anyway, soft and hard affinity are under the complete control of the
user (I guess we can say that they're policy), so we tend to accept
pretty much everything that comes from the user.

That is, the user can set an hard affinity to 1-6 and a soft affinity
of (a) 2-3, (b) 0-2, (c) 7-12, etc.

Case (a), i.e., soft is a strict subset of hard, is the one that makes
the most sense, of course. With this configuration, the vCPU(s) can run
on CPUs 1, 2, 3, 4, 5 and 6, but the scheduler will prefer to run it
(them) on 2 and/or 3.

Case (b), i.e., no strict subset, but there's some overlap, also means
that soft-affinity is going to be considered and have an effect. In
fact, vCPU(s) will prefer to run on CPUs 1 and/or 2, but of course it
(they) will never run on CPU 0. Of course, the user can, at a later
point in time, change the hard affinity so that it includes CPU 0, and
we'll be back to the strict-subset case. So that's way we want to keep
0 in the mast, even if it causes soft to not be a strict subset of
hard.

In case (c), soft affinity is totally useless. However, again, the user
can later change hard to include some or all CPUs 7-12, so we keep it.
We do, however, print a warning. And we also use the soft_aff_effective
flag to avoid going through the soft-affinity balancing step in the
scheduler code. This is, in fact, why we also check whether hard is not
a strict subset of soft. As, if it is, there's no need to do anything
about soft, as honoring hard will automatically take care of that as
well.

> Is that mainly
> to have a way to record preferences even when all preferred CPUs are
> offline, to be able to go back to the preferences once CPUs come back
> online?
> 
That's another example/use case, yes. We want to record the user's
preference, whatever the status of the system (and of other aspects of
the configuration) is.

But I'm not really sure I've answered... Have I?

> Then a follow-on question is: Why do you use cpumask_all for soft
> affinity in the first of the two calls above? Is this to cover for
> the
> case where all CPUs in dom0_cpus would go offline?
> 
Mmm... what else should I be using? If dom0_nodes is in "strict" mode,
we want to control hard affinity only. So we set soft to the default,
which is "all". During operations, since hard is a subset of "all",
soft-affinity will be just ignored.

So I'm using "all" because soft-affinity is just "all", unless someone
sets it differently.

But I am again not sure that I fully understood and properly addressed
your question. :-(


> > +    }
> >      else
> >          sched_set_affinity(unit, &cpumask_all, &cpumask_all);
> 
> Hmm, you leave this alone. Wouldn't it be better to further
> generalize
> things, in case domain affinity was set already? I was referring to
> the mask calculated by sched_select_initial_cpu() also in this
> regard.
> And when I did suggest to re-use the result, I did mean this
> literally.
> 
Technically, I think we can do that. Although, it's probably cumbersome
to do, without adding at least one cpumask on the stack, or reshuffle
the locking between sched_select_initial_cpu() and sched_init_vcpu(),
in a way that I (personally) don't find particularly pretty.

Also, I don't think we gain much from doing that, as we probably still
need to have some special casing of dom0, for handling dom0_vcpus_pin.

And again, soft and hard affinity should be set to what the user wants
and asks for. And if, for instance, he/she passes
dom0_nodes="1,strict", soft-affinity should just be all. If, e.g., we
set both hard and soft affinity to the CPUs of node 1, and if later
hard affinity is manually changed to "all", soft affinity will remain
to node 1, even if it was never asked for it to be that way, and the
user will need to change that explicitly as well. (Of course, it's not
particularly clever to boot with dom0_nodes="1,strict" and then change
dom0's vCPUs' hard affinity to node 0... but the user is free to do
that.)

> > --- a/xen/common/sched/credit2.c
> > +++ b/xen/common/sched/credit2.c
> > @@ -749,10 +749,12 @@ static int get_fallback_cpu(struct
> > csched2_unit *svc)
> >  
> >          /*
> >           * This is cases 2 or 4 (depending on bs): v->processor
> > isn't there
> > -         * any longer, check if we at least can stay in our
> > current runq.
> > +         * any longer, check if we at least can stay in our
> > current runq,
> > +        * if we have any (e.g., we don't yet, if we get here when
> > a unit
> > +        * is inserted for the very first time).
> >           */
> > -        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> > -                                       &svc->rqd->active)) )
> > +        if ( likely(svc->rqd &&
> > cpumask_intersects(cpumask_scratch_cpu(cpu),
> > +                                                   &svc->rqd-
> > >active)) )
> >          {
> >              cpumask_and(cpumask_scratch_cpu(cpu),
> > cpumask_scratch_cpu(cpu),
> >                          &svc->rqd->active);
> 
> This change is not covered by anything in the description, and I
> wonder why
> you're making the code adjustment. If svc->rqd was NULL, wouldn't Xen
> have
> crashed prior to the adjustment? I can't spot how it being NULL here
> could
> be the effect of any of the other changes you're making.
> 
It was crashing on me, with the above changes applied, but it's not any
longer. It's certainly something that, whether it's related or not,
should be addressed in its own patch, so I'll get rid of it and keep
looking.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.