[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity



On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:
> From: "Justin T. Weaver" <jtweaver@xxxxxxxxxx>
> 
> by making sure that vcpus only run on the pcpu(s) they are allowed to
> run on based on their hard affinity cpu masks.
> 
Ok, here I am reviewing this, at last... sorry for the delay! :-(

> Signed-off-by: Justin T. Weaver <jtweaver@xxxxxxxxxx>
> ---
> Changes in v2:
>  * Added dynamically allocated cpu masks to avoid putting them on the stack;
>    replaced temp masks from v1 throughout
>  * Added helper function for code suggested in v1 review and called it in two
>    locations in function choose_cpu
>  * Removed v1 change to comment in the beginning of choose_cpu
>  * Replaced two instances of cpumask_and/cpumask_empty with cpumask_intersects
>  * Removed v1 re-work of code in function migrate; only change in migrate in
>    v2 is the assignment of a valid pcpu from the destination run queue to
>    vc->processor
>  * In function csched2_vcpu_migrate: removed change from v1 that called
>    function migrate even if cur and dest run queues were the same in order
>    to get a runq_tickle call; added processor assignment to new_cpu to fix
>    the real underlying issue which was the vcpu not getting a call to
>    sched_move_irqs
>
Aha, so that was the issue! Glad you figured this out. :-)

>  * Removed the looping added in v1 in function balance_load; may be added back
>    later because it would help to have balance_load be more aware of hard
>    affinity, but adding it does not affect credit2's current inability to
>    respect hard affinity.
>  * Removed coding style fix in function balance_load
>  * Improved comment in function runq_candidate
>
Thanks for putting this changes recap here. :-)

> ---
>  xen/common/sched_credit2.c |  122 
> +++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 108 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index cf53770..de8fb5a 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -194,6 +194,12 @@ int opt_overload_balance_tolerance=-3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> + * Use this to avoid having too many cpumask_t structs on the stack
> + */
> +static cpumask_t **cpumask = NULL;
>
Not just 'cpumask', please... It's too generic a name. Let's pick up
something that makes it easier to understand what's the purpose of this.

I'm not really sure right now what something like that could be...
Credit has balance_mask, but we're not going as far as introducing
multiple step load balancing here (not with this patch, at least).

Maybe affinity_cpumask, or hard_affinity_cpumask (and we'll rename to
something else when introducing soft affinity, if needed).

> +#define csched2_cpumask cpumask[smp_processor_id()]
> +
I like the idea, but put the right side between parentheses. Of course,
what just said about the name applies here too. :-)

> @@ -268,6 +274,23 @@ struct csched2_dom {
>      uint16_t nr_vcpus;
>  };
>  
> +/*
> + * When a hard affinity change occurs, we may not be able to check some or
> + * all of the other run queues for a valid new processor for the given vcpu.
> + * Return svc's current pcpu if valid, otherwise return a safe pcpu.
> + */
>
Add at least a very quick mention on why this is (could be) necessary.

> +static int get_safe_pcpu(struct csched2_vcpu *svc)
> +{
>
I also don't like the name... __choose_cpu() maybe ?

> +    cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity, 
> &svc->rqd->active);
> +    if ( unlikely(cpumask_empty(csched2_cpumask)) )
> +        cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
> +            cpupool_online_cpumask(svc->vcpu->domain->cpupool));
>
VCPU2ONLINE(svc->vcpu) would make the line shorter.

Also, I know I'm the one that suggested this form for the code, but
thinking again about it, I'm not sure the first if is worth:

    cpumask_and(csched2_cpumask, &svc->rqd->active, VCPU2ONLINE(svc->vcpu));
    cpumask_and(csched2_cpumask, csched2_cpumask, svc->vcpu->cpu_hard_affinity);

Oh, and, with either yours or my variant... can csched2_cpumask end up
empty after the two &&-s? That's important or, below, cpumask_any will
return garbage! :-/

From a quick inspection, it does not seem it can, in which case, I'd put
down an ASSERT() about this somewhere. OTOH, if it can, I'd look for
ways of preventing that to happen before getting here... It seems easier
and more correct to do that, rather than trying to figure out what to do
here if the mask is empty. :-O

> +
> +    if ( cpumask_test_cpu(svc->vcpu->processor, csched2_cpumask) )
> +        return svc->vcpu->processor;
> +    else
> +        return cpumask_any(csched2_cpumask);
>
And, perhaps, we could put back the likely/unlikely hint here (provided
we think the then branch is actually likely):

    if ( likely(svc->vcpu->processor, csched2_cpumask) )
        return svc->vcpu->processor;
    else
        return cpumask_any(csched2_cpumask);

> @@ -1081,13 +1106,17 @@ choose_cpu(const struct scheduler *ops, struct vcpu 
> *vc)
>          else
>          {
>              d2printk("%pv +\n", svc->vcpu);
> -            new_cpu = cpumask_cycle(vc->processor, 
> &svc->migrate_rqd->active);
> -            goto out_up;
> +            cpumask_and(csched2_cpumask, vc->cpu_hard_affinity,
> +                &svc->migrate_rqd->active);
> +            if ( !cpumask_empty(csched2_cpumask) )
> +            {
> +                new_cpu = cpumask_any(csched2_cpumask);
> +                goto out_up;
> +            }
>
               cpumask_and(csched2_cpumask, vc->cpu_hard_affinity,
                           &svc->migrate_rqd->active);
               new_cpu = cpumask_any(csched2_cpumask);
               if ( new_cpu < nr_cpu_ids )
               {
                   d2printk("%pv +\n", svc->vcpu);
                   goto out_up;
               }

As I told you last round, checking the results of most of the
cpumask_foo() operations against nr_cpu_ids, can likely avoid the need
of more cpumask fiddling, and hence should be done whenever it is
possible.

> +            /* Fall-through to normal cpu pick */
>
Don't add this here. Instead, kill the instance of the exact same
comment in the if(){}, and only have it once ...

>          }
>
           ^ ... here, i.e., outside of the if (){}else{} block.

>      }
>  
> -    /* FIXME: Pay attention to cpu affinity */                               
>                                                        
> -
>      min_avgload = MAX_LOAD;
>  
>      /* Find the runqueue with the lowest instantaneous load */
> @@ -1099,17 +1128,24 @@ choose_cpu(const struct scheduler *ops, struct vcpu 
> *vc)
>          rqd = prv->rqd + i;
>  
>          /* If checking a different runqueue, grab the lock,
> -         * read the avg, and then release the lock.
> +         * check hard affinity, read the avg, and then release the lock.
>           *
>           * If on our own runqueue, don't grab or release the lock;
>           * but subtract our own load from the runqueue load to simulate
>           * impartiality */
>          if ( rqd == svc->rqd )
>          {
> +            if ( !cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +                continue;
>
I'd add a short comment on how it is possible that we are here and don't
have hard affinit with any pCPU in our runqueue (exactly as you
explained it during v1 review, i.e., in case affinit is being changed).

>              rqd_avgload = rqd->b_avgload - svc->avgload;
>          }
>          else if ( spin_trylock(&rqd->lock) )
>          {
> +            if ( !cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +            {
> +                spin_unlock(&rqd->lock);
> +                continue;
> +            }
>              rqd_avgload = rqd->b_avgload;
>              spin_unlock(&rqd->lock);
>          }
>
Something like this would be easier to read, IMO:

           if ( rqd == svc->rqd )
           {
               if ( cpumask_intersects(vc->cpu_hard_affinity,
                                       &rqd->active) )
                   rqd_avgload = rqd->b_avgload - svc->avgload;
           }
           else if ( spin_trylock(&rqd->lock) )
           {
               if ( cpumask_intersects(vc->cpu_hard_affinity,
                                       &rqd->active) )
                   rqd_avgload = rqd->b_avgload;

               spin_unlock(rqd->lock);
           }
           else
               continue;

Semantic should be the same, provided we initialize rqd_avgload to
MAX_LOAD, I would say.

In fact, without the two 'continue;' statements you were introducing,
we'll execute the if() that follows this block even if there was no
intersection with the hard affinity mask but, in that case, no chance we
have updated rqd_avgload, so it really should behave the same, what do
you think?

> @@ -1123,12 +1159,16 @@ choose_cpu(const struct scheduler *ops, struct vcpu 
> *vc)
>          }
>      }
>  
> -    /* We didn't find anyone (most likely because of spinlock contention); 
> leave it where it is */
>      if ( min_rqi == -1 )
> -        new_cpu = vc->processor;
> +    {
> +        /* No runqs found (most likely because of spinlock contention). */
> +        new_cpu = get_safe_pcpu(svc);
> +    }
>
No need to move the comment inside the if(), just kill the 'leave it
where it is' part.

> @@ -1330,6 +1375,12 @@ retry:
>          if ( test_bit(__CSFLAG_runq_migrate_request, &push_svc->flags) )
>              continue;
>  
> +        /* Skip if it can't run on the destination runq. */
> +        cpumask_and(csched2_cpumask, push_svc->vcpu->cpu_hard_affinity,
> +            &st.orqd->active);
> +        if ( cpumask_empty(csched2_cpumask) )
> +            continue;
> +
>          list_for_each( pull_iter, &st.orqd->svc )
>          {
>              struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct 
> csched2_vcpu, rqd_elem);
> @@ -1343,6 +1394,12 @@ retry:
>              if ( test_bit(__CSFLAG_runq_migrate_request, &pull_svc->flags) )
>                  continue;
>  
> +            /* Skip if it can't run on the destination runq. */
> +            cpumask_and(csched2_cpumask, pull_svc->vcpu->cpu_hard_affinity,
> +                &st.lrqd->active);
> +            if ( cpumask_empty(csched2_cpumask) )
> +                continue;
> +
>              consider(&st, push_svc, pull_svc);
>          }
>  
> @@ -1360,6 +1417,12 @@ retry:
>          if ( test_bit(__CSFLAG_runq_migrate_request, &pull_svc->flags) )
>              continue;
>  
> +        /* Skip if it can't run on the destination runq. */
> +        cpumask_and(csched2_cpumask, pull_svc->vcpu->cpu_hard_affinity,
> +            &st.lrqd->active);
> +        if ( cpumask_empty(csched2_cpumask) )
> +            continue;
> +
>
Same 4 lines of code (5, if we count the comment), repeated 3 times:
make an helper. :-)

> @@ -1396,6 +1459,15 @@ csched2_vcpu_migrate(
>  
>      /* Check if new_cpu is valid */
>      BUG_ON(!cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
> +    BUG_ON(!cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
>
Not sure.. An ASSERT() seems more suitable than a BUG_ON here... What's
the reasoning behind this?

> +    /*
> +     * Assign new_cpu to vc->processor here to get a call to sched_move_irqs
> +     * in schedule.c in case there was a hard affinity change within the same
> +     * run queue. vc will not be able to run in certain situations without
> +     * this call.
> +     */
> +    vc->processor = new_cpu;
>
Oh, and this is what was causing you troubles, in case source and
destination runqueue were the same... Help me understand, which call to
sched_move_irqs() in schedule.c were we missing? I'd say it is the one
in vcpu_migrate(), but that does not seem to care about vc->processor
(much rater about new_cpu)... what am I missing?

However, if they are not the same, the call to migrate() will override
this right away, won't it? What I mean to say is, if this is required
only in case trqd and svc->rqd are equal, can we tweak this part of
csched2_vcpu_migrate() as follows?

    if ( trqd != svc->rqd )
        migrate(ops, svc, trqd, NOW());
    else
        vc->processor = new_cpu;


Thanks and Regards,
Dario

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.