[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/5] xen: credit2: never consider CPUs outside of our cpupool.



On Tue, Jan 17, 2017 at 5:26 PM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> In fact, relying on the mask of what pCPUs belong to
> which Credit2 runqueue is not enough. If we only do that,
> when Credit2 is the boot scheduler, we may ASSERT() or
> panic when moving a pCPU from Pool-0 to another cpupool.
>
> This is because pCPUs outside of any pool are considered
> part of cpupool0. This puts us at risk of crash when those
> same pCPUs are added to another pool and something
> different than the idle domain is found to be running
> on them.
>
> Note that, even if we prevent the above to happen (which
> is the purpose of this patch), this is still pretty bad,
> in fact, when we remove a pCPU from Pool-0:
> - in Credit1, as we do *not* update prv->ncpus and
>   prv->credit, which means we're considering the wrong
>   total credits when doing accounting;
> - in Credit2, the pCPU remains part of one runqueue,
>   and is hence at least considered during load balancing,
>   even if no vCPU should really run there.
>
> In Credit1, this "only" causes skewed accounting and
> no crashes because there is a lot of `cpumask_and`ing
> going on with the cpumask of the domains' cpupool
> (which, BTW, comes at a price).
>
> A quick and not to involved (and easily backportable)
> solution for Credit2, is to do exactly the same.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx

Blech.  But I agree we need a fix we can backport:

Acked-by: George Dunlap <george.dunlap@xxxxxxxxxx>

> ---
> Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> Cc: Juergen Gross <jgross@xxxxxxxx>
> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> This is a bugfix, and should be backported to 4.8.
> ---
> The proper solution would mean calling deinit_pdata() when removing a pCPU 
> from
> cpupool0, as well as a bit more of code reshuffling.
>
> And, although worth doing, it certainly will take more work, more time, and
> will probably be hard (well, surely harder than this) to backport.
>
> Therefore, I'd argue in favor of both taking and backporting this change, 
> which
> at least enables using Credit2 as default scheduler without risking a crash
> when creating a second cpupool.
>
> Afterwards, a proper solution would be proposed for Xen 4.9.
>
> Finally, given the wide number of issues similar to this that I've found and
> fixed in the last release cycle, I think it would be good to take a stab at
> whether the interface between cpupools and the schedulers could not be
> simplified. :-O
>
> Regards,
> Dario
> ---
>  xen/common/sched_credit2.c |   59 
> ++++++++++++++++++++++++++++----------------
>  1 file changed, 38 insertions(+), 21 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 523922e..ce0e146 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -510,19 +510,22 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t 
> *mask)
>   */
>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>  {
> -    int fallback_cpu, cpu = svc->vcpu->processor;
> +    struct vcpu *v = svc->vcpu;
> +    int cpu = v->processor;
>
> -    if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
> -        return cpu;
> +    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> +                cpupool_domain_cpumask(v->domain));
>
> -    cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
> -                &svc->rqd->active);
> -    fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> -    if ( likely(fallback_cpu < nr_cpu_ids) )
> -        return fallback_cpu;
> +    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> +        return cpu;
>
> -    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
> -                cpupool_domain_cpumask(svc->vcpu->domain));
> +    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                                   &svc->rqd->active)) )
> +    {
> +        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
> +                    cpumask_scratch_cpu(cpu));
> +        return cpumask_first(cpumask_scratch_cpu(cpu));
> +    }
>
>      ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>
> @@ -940,6 +943,9 @@ runq_tickle(const struct scheduler *ops, struct 
> csched2_vcpu *new, s_time_t now)
>                      (unsigned char *)&d);
>      }
>
> +    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
> +                cpupool_domain_cpumask(new->vcpu->domain));
> +
>      /*
>       * First of all, consider idle cpus, checking if we can just
>       * re-use the pcpu where we were running before.
> @@ -952,7 +958,7 @@ runq_tickle(const struct scheduler *ops, struct 
> csched2_vcpu *new, s_time_t now)
>          cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
>      else
>          cpumask_copy(&mask, &rqd->smt_idle);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> +    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
>      i = cpumask_test_or_cycle(cpu, &mask);
>      if ( i < nr_cpu_ids )
>      {
> @@ -967,7 +973,7 @@ runq_tickle(const struct scheduler *ops, struct 
> csched2_vcpu *new, s_time_t now)
>       * gone through the scheduler yet.
>       */
>      cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> +    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
>      i = cpumask_test_or_cycle(cpu, &mask);
>      if ( i < nr_cpu_ids )
>      {
> @@ -983,7 +989,7 @@ runq_tickle(const struct scheduler *ops, struct 
> csched2_vcpu *new, s_time_t now)
>       */
>      cpumask_andnot(&mask, &rqd->active, &rqd->idle);
>      cpumask_andnot(&mask, &mask, &rqd->tickled);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> +    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
>      if ( cpumask_test_cpu(cpu, &mask) )
>      {
>          cur = CSCHED2_VCPU(curr_on_cpu(cpu));
> @@ -1525,6 +1531,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct 
> vcpu *vc)
>          goto out;
>      }
>
> +    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
> +                cpupool_domain_cpumask(vc->domain));
> +
>      /*
>       * First check to see if we're here because someone else suggested a 
> place
>       * for us to move.
> @@ -1536,13 +1545,13 @@ csched2_cpu_pick(const struct scheduler *ops, struct 
> vcpu *vc)
>              printk(XENLOG_WARNING "%s: target runqueue disappeared!\n",
>                     __func__);
>          }
> -        else
> +        else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                                     &svc->migrate_rqd->active) )
>          {
> -            cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
> +            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                          &svc->migrate_rqd->active);
>              new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
> -            if ( new_cpu < nr_cpu_ids )
> -                goto out_up;
> +            goto out_up;
>          }
>          /* Fall-through to normal cpu pick */
>      }
> @@ -1570,12 +1579,12 @@ csched2_cpu_pick(const struct scheduler *ops, struct 
> vcpu *vc)
>           */
>          if ( rqd == svc->rqd )
>          {
> -            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
>                  rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 
> 0);
>          }
>          else if ( spin_trylock(&rqd->lock) )
>          {
> -            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
>                  rqd_avgload = rqd->b_avgload;
>
>              spin_unlock(&rqd->lock);
> @@ -1597,7 +1606,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct 
> vcpu *vc)
>          goto out_up;
>      }
>
> -    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
> +    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                  &prv->rqd[min_rqi].active);
>      new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
>      BUG_ON(new_cpu >= nr_cpu_ids);
> @@ -1713,6 +1722,8 @@ static void migrate(const struct scheduler *ops,
>          __runq_deassign(svc);
>
>          cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
> +                    cpupool_domain_cpumask(svc->vcpu->domain));
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                      &trqd->active);
>          svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
>          ASSERT(svc->vcpu->processor < nr_cpu_ids);
> @@ -1738,8 +1749,14 @@ static void migrate(const struct scheduler *ops,
>  static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
>                                    struct csched2_runqueue_data *rqd)
>  {
> +    struct vcpu *v = svc->vcpu;
> +    int cpu = svc->vcpu->processor;
> +
> +    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> +                cpupool_domain_cpumask(v->domain));
> +
>      return !(svc->flags & CSFLAG_runq_migrate_request) &&
> -           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
> +           cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
>  }
>
>  static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.