Xen project Mailing List

Re: [Xen-devel] [PATCH v3 4/4] sched: credit2: consider per-vcpu soft affinity

To: "Justin T. Weaver" <jtweaver@xxxxxxxxxx>

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Mon, 20 Apr 2015 16:38:55 +0100

Cc: Dario Faggioli <dario.faggioli@xxxxxxxxxx>, henric@xxxxxxxxxx, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Mon, 20 Apr 2015 15:39:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, Mar 26, 2015 at 9:48 AM, Justin T. Weaver <jtweaver@xxxxxxxxxx> wrote: > * choose_cpu > > choose_cpu now tries to find the run queue with the most cpus in the given > vcpu's soft affinity. It uses minimum run queue load as a tie breaker. [snip] > * choose_cpu: added balance loop to find cpu for given vcpu that has most > soft cpus (with run queue load being a tie breaker), or if none were found, > or not considering soft affinity, pick cpu from runq with least load [snip] > @@ -1086,7 +1130,7 @@ static int > choose_cpu(const struct scheduler *ops, struct vcpu *vc) > { > struct csched2_private *prv = CSCHED2_PRIV(ops); > - int i, min_rqi = -1, new_cpu; > + int i, rqi = -1, new_cpu, max_soft_cpus = 0, balance_step; > struct csched2_vcpu *svc = CSCHED2_VCPU(vc); > s_time_t min_avgload; > Hey Justin -- sorry for taking so long to get back to this one. Before getting into the changes to choose_cpu(): it looks like on the __CSFLAG_runq_migrate_request path (starting with "First check to see if we're here because someone else suggested a place for us to move"), we only consider the hard affinity, not the soft affinity. Is that intentional? > @@ -1143,9 +1187,28 @@ choose_cpu(const struct scheduler *ops, struct vcpu > *vc) > > min_avgload = MAX_LOAD; > > - /* Find the runqueue with the lowest instantaneous load */ > + /* > + * Find the run queue with the most cpus in vc's soft affinity. If there > + * is more than one queue with the highest soft affinity cpu count, then > + * pick the one with the lowest instantaneous run queue load. If the > + * vcpu does not have soft affinity, then only try to find the run queue > + * with the lowest instantaneous load. > + */ > + for_each_sched_balance_step( balance_step ) > + { > + if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY > + && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) ) > + continue; > + > + if ( balance_step == SCHED_BALANCE_HARD_AFFINITY && rqi > -1 ) > + { > + balance_step = SCHED_BALANCE_SOFT_AFFINITY; > + break; > + } > + > for_each_cpu(i, &prv->active_queues) > { > + int rqd_soft_cpus = 0; > struct csched2_runqueue_data *rqd; > s_time_t rqd_avgload = MAX_LOAD; > > @@ -1163,35 +1226,61 @@ choose_cpu(const struct scheduler *ops, struct vcpu > *vc) > * so it is possible here that svc does not have hard affinity > * with any of the pcpus of svc's currently assigned run queue. > */ > + sched_balance_cpumask(vc, balance_step, csched2_cpumask); > if ( rqd == svc->rqd ) > { > - if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) > ) > + if ( cpumask_intersects(csched2_cpumask, &rqd->active) ) > rqd_avgload = rqd->b_avgload - svc->avgload; > + if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY ) > + { > + cpumask_and(csched2_cpumask, csched2_cpumask, > + &rqd->active); > + rqd_soft_cpus = cpumask_weight(csched2_cpumask); > + } > } > else if ( spin_trylock(&rqd->lock) ) > { > - if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) > ) > + if ( cpumask_intersects(csched2_cpumask, &rqd->active) ) > rqd_avgload = rqd->b_avgload; > + if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY ) > + { > + cpumask_and(csched2_cpumask, csched2_cpumask, > + &rqd->active); > + rqd_soft_cpus = cpumask_weight(csched2_cpumask); > + } > > spin_unlock(&rqd->lock); > } > else > continue; > > - if ( rqd_avgload < min_avgload ) > + if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY > + && rqd_soft_cpus > 0 > + && ( rqd_soft_cpus > max_soft_cpus > + || > + ( rqd_soft_cpus == max_soft_cpus > + && rqd_avgload < min_avgload )) ) > + { > + max_soft_cpus = rqd_soft_cpus; > + rqi = i; > + min_avgload = rqd_avgload; > + } > + else if ( balance_step == SCHED_BALANCE_HARD_AFFINITY > + && rqd_avgload < min_avgload ) > { > + rqi = i; > min_avgload = rqd_avgload; > - min_rqi=i; > } > + } > } > > /* We didn't find anyone (most likely because of spinlock contention). */ > - if ( min_rqi == -1 ) > + if ( rqi == -1 ) > new_cpu = get_fallback_cpu(svc); > else > { > - cpumask_and(csched2_cpumask, vc->cpu_hard_affinity, > - &prv->rqd[min_rqi].active); > + sched_balance_cpumask(vc, balance_step, csched2_cpumask); > + cpumask_and(csched2_cpumask, csched2_cpumask, &prv->rqd[rqi].active); > new_cpu = cpumask_any(csched2_cpumask); > BUG_ON(new_cpu >= nr_cpu_ids); > } So the general plan here looks right; but is there really a need to go through the whole thing twice? Couldn't we keep track of "rqi with highest # cpus in soft affinity / lowest avgload" and "rqi with lowest global avgload" in one pass, and then choose whichever one looks the best at the end? I think for closure sake I'm going to send this e-mail, and review the load balancing step in another mail (which will come later this evening). -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.