[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 4/4] sched: credit2: consider per-vcpu soft affinity



On Thu, Mar 26, 2015 at 9:48 AM, Justin T. Weaver <jtweaver@xxxxxxxxxx> wrote:
[snip]
> Added a function to determine the number of soft cpus gained (or lost) by a
> given vcpu if it is migrated from a given source run queue to a given
> destination run queue.
>
> Modified algorithm in balance_load and consider...
>  * if the load on lrqd and/or orqd is less than the number of their active
>    cpus, balance_load will look for vcpus that would have their soft affinity
>    improved by being pushed and/or pulled. Load does not need to be considered
>    since a run queue recieveing a pushed or pulled vcpu is not being fully
>    utilized. This returns vcpus that may have been migrated away from their
>    soft affinity due to load balancing back to their soft affinity.
>  * in consider, vcpus that might be picked for migration because pushing or
>    pulling them decreases the load delta are not picked if their current run
>    queue's load is less than its active cpu count and if that migration would
>    harm their soft affinity. There's no need to push/pull if the load is under
>    capacity, and the vcpu would lose access to some or all of its soft cpus.
>  * in consider, if a push/pull/swap migration decreases the load delta by a
>    similar amount to another push/pull/swap migration, then use soft cpu gain
>    as a tie breaker. This allows load to continue to balance across run 
> queues,
>    but favors soft affinity gains if the load deltas are close.

OK, I've done a lot of thinking about this over the last few days, and
here's what I've come up with.

As I said before, there are basically two different activities in this
patch.  The first is to try to honor soft affinity  *given* where the
vcpu is placed; the other is to try to take soft affinity into account
when *placing* a vcpu.

The first bit is basically correct, and I've already given feedback on it.

The second half certainly represents a lot of hard work and I think
it's not a *bad* solution; but overall I'm not satisfied with it and I
think we can do better.

At a basic level, I don't like adding in the "if the load is less than
the number of pcpus then ignore load and only pull based on
softaffinity" shortcut to balance_load.  I think that it's likely that
the two halves of balance_load will have very different ideas about
what "good" looks like, and that as a result vcpus will end up getting
bounced back and forth depending on which particular algorithm ends up
being activated.

Apart from that, the whole thing is rather ad-hoc and makes it much
more complicated to understand what is going on, or predict what would
happen.

My original plan wrt load balancing was to calculate some sort of
"badness" (or perhaps "expected overhead") to the current placement
situation, and try to minimize that.  The idea was that you could then
try to boil all the factors down to a single number to optimize.  For
example, you could add in "badness" for spinning up extra cores or
sockets (or "goodness" for having them entirely empty), and then
automatically balance performance vs power consumption; or, in this
case, to balance the cost of running with remote memory accesses (as
running outside your soft affinity would normally mean) vs the cost of
waiting for the cpu.

Of course, if you're only considering load, then no matter what you
do, minimizing "badness" means minimizing load, so there's no reason
to complicate things, which is why I wrote the load balancing code the
way I did.  But I've been thinking for the last few days and I think I
can probably structure it differently to make it easier to get a
consistent evaluation of the goodness / badness of a given setup.

So what about this:

1. You re-submit this series, with the comments I've given, and the
first half of this patch (i.e., leaving out the changes to choose_cpu
and balance_load)

2. I'll work on re-working the load balancing code based on my ideas;
i.e., calculating "badness" of the load and deciding  based on that.
I'll probably try to do the soft-affinity factoring myself as a proof
of concept.

What do you think?  (Feel free to weigh in here too, Dario.)

---

Also, a few comments on the code (just for feedback):

> @@ -1207,15 +1296,75 @@ typedef struct {
>      /* NB: Modified by consider() */
>      s_time_t load_delta;
>      struct csched2_vcpu * best_push_svc, *best_pull_svc;
> +    int soft_affinity_boost;
> +    bool_t valid_sa_boost;
>      /* NB: Read by consider() */
>      struct csched2_runqueue_data *lrqd;
>      struct csched2_runqueue_data *orqd;
>  } balance_state_t;
>
> +/*
> + * Return the number of pcpus gained in vc's soft affinity mask that vc can
> + * run on if vc is migrated from run queue src_rqd to run queue dst_rqd.
> + */
> +static int get_soft_affinity_gain(const struct vcpu *vc,
> +                                  const struct csched2_runqueue_data 
> *src_rqd,
> +                                  const struct csched2_runqueue_data 
> *dst_rqd)
> +{
> +    /*
> +     * Locks must already be held for src_rqd and dst_rqd.
> +     * Function assumes vc has at least hard affinity with one or more
> +     * pcpus in both the source and destination run queues.
> +     */
> +
> +    /* Does vcpu not have soft affinity? */
> +    if ( !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) )
> +        return 0;
> +
> +    /* Does vcpu have soft affinity with pcpu(s) in the destination runq? */
> +    sched_balance_cpumask(vc, SCHED_BALANCE_SOFT_AFFINITY, csched2_cpumask);
> +    if ( cpumask_intersects(csched2_cpumask, &dst_rqd->active) )
> +    {
> +        int soft_cpus_dst;
> +        cpumask_and(csched2_cpumask, csched2_cpumask, &dst_rqd->active);
> +        soft_cpus_dst = cpumask_weight(csched2_cpumask);
> +
> +        /* ... and soft affinity with the source run queue? */
> +        sched_balance_cpumask(vc, SCHED_BALANCE_SOFT_AFFINITY,
> +            csched2_cpumask);
> +        if ( cpumask_intersects(csched2_cpumask, &src_rqd->active) )
> +        {
> +            int soft_cpus_src;
> +            cpumask_and(csched2_cpumask, csched2_cpumask, &src_rqd->active);
> +            soft_cpus_src = cpumask_weight(csched2_cpumask);
> +
> +            /* Soft affinity to soft affinity migration. */
> +            return soft_cpus_dst - soft_cpus_src;
> +        }
> +        else
> +            /* Hard affinity to soft affinity migration. */
> +            return soft_cpus_dst;
> +    }
> +    else
> +    {
> +        int soft_cpus_src = 0;
> +        cpumask_and(csched2_cpumask, csched2_cpumask, &src_rqd->active);
> +        soft_cpus_src = cpumask_weight(csched2_cpumask);
> +
> +        /*
> +         * Hard affinity to hard affinity migration or soft affinity to hard
> +         * affinity migration.
> +         */
> +        return -soft_cpus_src;
> +    }

Is there a reason you can't just do the following?
---
int soft_cpus_dst=0, soft_cpus_src=0;

sched_balance_cpumask(...)

if(cpumask_intersects(cpumask, dst)) {
  cpumask_and(...)
  soft_cpus_dst = cpumask_weight(...)

  /* Restore soft affinity mask for next check */
  sched_balance_cpumask(...)
}

if(cpumask_intersects(cpumask, src)) {
  cpumask_and(...)
  soft_cpus_src = cpumask_weight(...)
}

return soft_cpus_dst - soft_cpus_src;
---

It seems like that would be a lot easier to figure out what's going on.

> @@ -1237,12 +1386,88 @@ static void consider(balance_state_t *st,
>      if ( delta < 0 )
>          delta = -delta;
>
> -    if ( delta < st->load_delta )
> +    /*
> +     * Use soft affinity gain as a tie breaker if at least one migration has
> +     * already been picked and stored in the balance state, and the absolute
> +     * value of the difference between the delta in st and the new delta 
> being
> +     * considered here is less than 1/16th of the load_window_shift.
> +     */
> +    delta_diff = delta - st->load_delta;
> +    if ( delta_diff < 0 )
> +        delta_diff = -delta_diff;
> +    if ( (st->best_push_svc != NULL || st->best_pull_svc != NULL)
> +        && delta_diff < 1<<(load_window_shift - CSCHED2_DIVIDE_BY_16) )
>      {
> -        st->load_delta = delta;
> -        st->best_push_svc=push_svc;
> -        st->best_pull_svc=pull_svc;
> +        int st_soft_gain = 0, consider_soft_gain = 0;
> +
> +        /* Find the soft affinity gain for the migration in st. */
> +        if ( !st->valid_sa_boost )
> +            if ( st->best_push_svc )
> +                st_soft_gain += get_soft_affinity_gain(
> +                    st->best_push_svc->vcpu, st->lrqd, st->orqd);
> +            if ( st->best_pull_svc )
> +                st_soft_gain += get_soft_affinity_gain(
> +                    st->best_pull_svc->vcpu, st->orqd, st->lrqd);
> +        else
> +            st_soft_gain = st->soft_affinity_boost;

Looks like you're missing some braces around those two indented if's.

> +    /* Only consider load delta. */
> +    if ( delta < st->load_delta )
> +        st->valid_sa_boost = 0;
> +
> +        /*
> +         * If the migration results in a loss of some or all soft cpus and 
> the
> +         * vcpu's current run queue has less load than physical processors, 
> do
> +         * not use the migration.
> +         */
> +        if ( push_svc &&
> +            (st->lrqd->load < cpumask_weight(&st->lrqd->active) &&
> +             get_soft_affinity_gain(push_svc->vcpu, st->lrqd, st->orqd) < 0) 
> )
> +            return;
> +        if ( pull_svc &&
> +            (st->orqd->load < cpumask_weight(&st->orqd->active) &&
> +             get_soft_affinity_gain(pull_svc->vcpu, st->orqd, st->lrqd) < 0) 
> )
> +            return;
> +    else
> +        return;

Missing braces again.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.