[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: make "dom0_nodes=" work with credit2


  • To: Dario Faggioli <dfaggioli@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 13 Apr 2022 12:00:12 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PQMIrFcPmVtolBh282r4JMvHWMIA/wsKbfYGuxCseDA=; b=L4WbSYuoTM4mjabzeH6P8yddNItiZIY5XkT40WYTxN+lW7RgaHoKMvGn5t9zYUg+wXSs2jHNU2ZUufO8lKG85WhZWjNIfTLVqRpXpiqw+dtVIu8idE1lR9Cbp5uHrYOVsbGdS5mWSOLTJ0Om86PXC9BpJBdFo+qPdLDgbEsLDt1c6qhRMaeH8ljGthZPunjpJp28BCeAWsHiktm6tcKPX6ohkAqsDcULamFRox1RPwWysSDeGVE8BAJvkEhyBMe76swUoWCd6K6eKTYdR1LmoGmVTNZ2ZB7Sxqhs+Sdbk57Ld89dDaN53xrkgMEs0AfjbI+B+88Z2Y7Y/0Li4pTDeg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=K3+8la/G5k8tnqLg3GDbwB0AdHlzGr7wDYluMoG2l5wDsMGZaYKtEDLFOV4LdhSgZ/AuPjFbtEB0oBcGhJ2qixf7PlsYEoanu6kBGkXzOFTKgskWglF/rA5Cni5J7r89AqAVDquKkpQPQDJsIC+66WXmaU5Chp47VArCYvNCadmc3b0ieq6bF3d5Frz6lPDesrT+SZ2ZY/Bt28HbynQwC2OqvlLgXsc7wyJxwGSaHMUBndj0WMfeoRjVi0m/2bnszCUf9RtNIFh/0zybzERw7//iw251Y07TlYICT9mLkNPykpTr8HdsNkUSoryQOInC8oBYzvJmWej4iQc+HYsSgQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 13 Apr 2022 10:00:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 12.04.2022 18:11, Dario Faggioli wrote:
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -572,11 +572,41 @@ int sched_init_vcpu(struct vcpu *v)
>      }
>  
>      /*
> -     * Initialize affinity settings. The idler, and potentially
> -     * domain-0 VCPUs, are pinned onto their respective physical CPUs.
> +     * Initialize affinity settings. By doing this before the unit is
> +     * inserted in the scheduler runqueues (by the call to 
> sched_insert_unit(),
> +     * at the end of the function, we are sure that it will be put on an
> +     * appropriate CPU.
>       */
> -    if ( is_idle_domain(d) || (is_hardware_domain(d) && opt_dom0_vcpus_pin) )
> +    if ( pv_shim && v->vcpu_id == 0 )

I don't think you can handle the shim case first, as then you'd also have
its CPU0 idle vCPU take this path. The difference _may_ only be cosmetic,
but I think it would be odd for CPU0's idle vCPU to have a soft affinity
of just CPU0, while all others use cpumask_all.

> +    {
> +        /*
> +         * PV-shim: vcpus are pinned 1:1. Initially only 1 cpu is online,
> +         * others will be dealt with when onlining them. This avoids pinning
> +         * a vcpu to a not yet online cpu here.
> +         */
> +        sched_set_affinity(unit, cpumask_of(0), cpumask_of(0));
> +    }
> +    else if ( is_idle_domain(d) || (is_hardware_domain(d) && 
> opt_dom0_vcpus_pin) )

Here (pre-existing) as well as ...

> +    {
> +        /*
> +         * The idler, and potentially domain-0 VCPUs, are pinned onto their
> +         * respective physical CPUs.
> +         */
>          sched_set_affinity(unit, cpumask_of(processor), &cpumask_all);
> +    }
> +    else if ( is_hardware_domain(d) )

... here I wonder: Shouldn't this be limited to Dom0 (for the purposes here
!= hwdom)? Any special affinity for a late hwdom ought to be specified by
the logic creating that domain imo, not by command line options concerning
Dom0 only.

This then also determines where the involved variables (opt_dom0_vcpus_pin,
dom0_affinity_relaxed, and dom0_cpus) are to be placed (answering your
question in a subsequent reply): If it's strictly Dom0 only (and knowing
there's no way to add vCPU-s to a domain post-construction), then
__initdata is fine for all of them. If late hwdom was to also be covered,
__hwdom_initdata would need to be used.

> +    {
> +        /*
> +         * In absence of dom0_vcpus_pin, the hard and soft affinity of
> +         * domain-0 is controlled by the dom0_nodes parameter. At this point
> +         * it has been parsed and decoded, and we have the result of that
> +         * in the dom0_cpus mask.
> +         */
> +        if ( !dom0_affinity_relaxed )
> +            sched_set_affinity(unit, &dom0_cpus, &cpumask_all);
> +        else
> +            sched_set_affinity(unit, &cpumask_all, &dom0_cpus);

I guess by referencing dom0_affinity_relaxed and dom0_cpus outside of
CONFIG_X86 section you're breaking the Arm build.

I also have a more general question here: sched.h says "Bitmask of CPUs
on which this VCPU may run" for hard affinity and "Bitmask of CPUs on
which this VCPU prefers to run" for soft affinity. Additionally there's
soft_aff_effective. Does it make sense in the first place for one to be
a proper subset of the of the other in _both_ directions? Is that mainly
to have a way to record preferences even when all preferred CPUs are
offline, to be able to go back to the preferences once CPUs come back
online?

Then a follow-on question is: Why do you use cpumask_all for soft
affinity in the first of the two calls above? Is this to cover for the
case where all CPUs in dom0_cpus would go offline?

> +    }
>      else
>          sched_set_affinity(unit, &cpumask_all, &cpumask_all);

Hmm, you leave this alone. Wouldn't it be better to further generalize
things, in case domain affinity was set already? I was referring to
the mask calculated by sched_select_initial_cpu() also in this regard.
And when I did suggest to re-use the result, I did mean this literally.

> @@ -3386,29 +3416,18 @@ void wait(void)
>  void __init sched_setup_dom0_vcpus(struct domain *d)
>  {
>      unsigned int i;
> -    struct sched_unit *unit;
>  
>      for ( i = 1; i < d->max_vcpus; i++ )
>          vcpu_create(d, i);
>  
>      /*
> -     * PV-shim: vcpus are pinned 1:1.
> -     * Initially only 1 cpu is online, others will be dealt with when
> -     * onlining them. This avoids pinning a vcpu to a not yet online cpu 
> here.
> +     * sched_vcpu_init(), called by vcpu_create(), will setup the hard and
> +     * soft affinity of all the vCPUs, by calling sched_set_affinity() on 
> each
> +     * one of them. We can now make sure that the domain's node affinity is
> +     * also updated accordingly, and we can do that here, once and for all
> +     * (which is more efficient than calling domain_update_node_affinity()
> +     * on all the vCPUs).
>       */
> -    if ( pv_shim )
> -        sched_set_affinity(d->vcpu[0]->sched_unit,
> -                           cpumask_of(0), cpumask_of(0));
> -    else
> -    {
> -        for_each_sched_unit ( d, unit )
> -        {
> -            if ( !opt_dom0_vcpus_pin && !dom0_affinity_relaxed )
> -                sched_set_affinity(unit, &dom0_cpus, NULL);
> -            sched_set_affinity(unit, NULL, &dom0_cpus);
> -        }
> -    }
> -
>      domain_update_node_affinity(d);
>  }

I consider the comment somewhat misleading, and hence I wonder if a comment
is needed here in the first place. domain_update_node_affinity() acts on a
domain, not on individual vCPU-s. Hence it's not clear what "calling
domain_update_node_affinity() on all the vCPUs" would be referring to. I
don't think anyone would consider calling vcpu_set_affinity() here just for
the purpose of updating domain affinity, with all other (vCPU) affinity
setting now happening elsewhere.

> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -749,10 +749,12 @@ static int get_fallback_cpu(struct csched2_unit *svc)
>  
>          /*
>           * This is cases 2 or 4 (depending on bs): v->processor isn't there
> -         * any longer, check if we at least can stay in our current runq.
> +         * any longer, check if we at least can stay in our current runq,
> +      * if we have any (e.g., we don't yet, if we get here when a unit
> +      * is inserted for the very first time).
>           */
> -        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> -                                       &svc->rqd->active)) )
> +        if ( likely(svc->rqd && cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                                                   &svc->rqd->active)) )
>          {
>              cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                          &svc->rqd->active);

This change is not covered by anything in the description, and I wonder why
you're making the code adjustment. If svc->rqd was NULL, wouldn't Xen have
crashed prior to the adjustment? I can't spot how it being NULL here could
be the effect of any of the other changes you're making.

If the comment adjustment is to be retained, please take care of the hard
tabs which have slipped in.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.