[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/4] sched: credit2: respect per-vcpu hard affinity



On 03/26/2015 09:48 AM, Justin T. Weaver wrote:
> by making sure that vcpus only run on the pcpu(s) they are allowed to
> run on based on their hard affinity cpu masks.
> 
> Signed-off-by: Justin T. Weaver <jtweaver@xxxxxxxxxx>

Hey Justin!  Getting close.  A couple of comments:

> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 7581731..af716e4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -176,6 +176,7 @@ integer_param("sched_credit2_migrate_resist", 
> opt_migrate_resist);
>  #define c2r(_ops, _cpu)     (CSCHED2_PRIV(_ops)->runq_map[(_cpu)])
>  /* CPU to runqueue struct macro */
>  #define RQD(_ops, _cpu)     (&CSCHED2_PRIV(_ops)->rqd[c2r(_ops, _cpu)])
> +#define VCPU2ONLINE(_v)     cpupool_online_cpumask((_v)->domain->cpupool)
>  
>  /*
>   * Shifts for load average.
> @@ -194,6 +195,12 @@ int opt_overload_balance_tolerance=-3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> + * Use this to avoid having too many cpumask_t structs on the stack
> + */
> +static cpumask_t **scratch_mask = NULL;
> +#define csched2_cpumask scratch_mask[smp_processor_id()]

Since I'm asking for changes below:  It's not clear to me when I'm
scanning the code that csched2_cpumask is a scratch variable.  What
about calling the actual variable _scratch_cpumask and havind the
#define be scratch_cpumask?

>      /* Find the runqueue with the lowest instantaneous load */
>      for_each_cpu(i, &prv->active_queues)
>      {
>          struct csched2_runqueue_data *rqd;
> -        s_time_t rqd_avgload;
> +        s_time_t rqd_avgload = MAX_LOAD;
>  
>          rqd = prv->rqd + i;
>  
>          /* If checking a different runqueue, grab the lock,
> -         * read the avg, and then release the lock.
> +         * check hard affinity, read the avg, and then release the lock.
>           *
>           * If on our own runqueue, don't grab or release the lock;
>           * but subtract our own load from the runqueue load to simulate
> -         * impartiality */
> +         * impartiality.
> +         *
> +         * svc's hard affinity may have changed; this function is the
> +         * credit 2 scheduler's first opportunity to react to the change,
> +         * so it is possible here that svc does not have hard affinity
> +         * with any of the pcpus of svc's currently assigned run queue.
> +         */
>          if ( rqd == svc->rqd )
>          {
> -            rqd_avgload = rqd->b_avgload - svc->avgload;
> +            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +                rqd_avgload = rqd->b_avgload - svc->avgload;
>          }
>          else if ( spin_trylock(&rqd->lock) )
>          {
> -            rqd_avgload = rqd->b_avgload;
> +            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +                rqd_avgload = rqd->b_avgload;
> +
>              spin_unlock(&rqd->lock);
>          }
>          else

Since we're already potentially falling through and doing the comparison
below with an unchanged rqd_avgload (which has now been set to MAX_LOAD
above), I wonder if it makes more sense to remove this "else continue"
here, just to avoid confusing people.

> @@ -2024,6 +2096,13 @@ csched2_alloc_pdata(const struct scheduler *ops, int 
> cpu)
>          printk("%s: cpu %d not online yet, deferring initializatgion\n",
>                 __func__, cpu);
>  
> +    /*
> +     * For each new pcpu, allocate a cpumask_t for use throughout the
> +     * scheduler to avoid putting any cpumask_t structs on the stack.
> +     */
> +    if ( !zalloc_cpumask_var(&scratch_mask[cpu]) )

Any reason not to use "scratch_mask + cpu" here rather than
"&scratch_mask[cpu]"?

It might not be a bad idea to ad ASSERT(scratch_mask[cpu] == NULL)
before this, just to be paranoid...

> +        return NULL;
> +
>      return (void *)1;
>  }
>  
> @@ -2072,6 +2151,8 @@ csched2_free_pdata(const struct scheduler *ops, void 
> *pcpu, int cpu)
>  
>      spin_unlock_irqrestore(&prv->lock, flags);
>  
> +    free_cpumask_var(scratch_mask[cpu]);

We should definitely set this to NULL after freeing it (see below)

> +
>      return;
>  }
>  
> @@ -2159,6 +2240,10 @@ csched2_init(struct scheduler *ops)
>  
>      prv->load_window_shift = opt_load_window_shift;
>  
> +    scratch_mask = xmalloc_array(cpumask_t *, nr_cpu_ids);

I realize Dario recommended using xmalloc_array() instead of
xzalloc_array(), but I don't understand why he thinks that's OK.  His
mail says "(see below about why I actually don't
think we need)", but I don't actually see that addressed in that e-mail.

I think it's just dangerous to leave uninitialized pointers around.  The
invariant should be that if the array entry is invalid it's NULL, and if
it's non-null then it's valid.

(* marc.info/?i=<1426266674.32500.25.camel@xxxxxxxxxx>)

Also -- I think this allocation wants to happen in global_init(), right?
 Otherwise if you make a second cpupool with the credit2 scheduler this
will be clobbered.  (I think nr_cpu_ids should be defined at that point.)

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.