Xen project Mailing List

Re: [Xen-devel] [PATCH v3 1/4] sched: credit2: respect per-vcpu hard affinity

To: "Justin T. Weaver" <jtweaver@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>

From: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

Date: Tue, 31 Mar 2015 15:37:28 +0100

Cc: dario.faggioli@xxxxxxxxxx, henric@xxxxxxxxxx

Delivery-date: Tue, 31 Mar 2015 14:39:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 03/26/2015 09:48 AM, Justin T. Weaver wrote: > by making sure that vcpus only run on the pcpu(s) they are allowed to > run on based on their hard affinity cpu masks. > > Signed-off-by: Justin T. Weaver <jtweaver@xxxxxxxxxx> Hey Justin! Getting close. A couple of comments: > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 7581731..af716e4 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -176,6 +176,7 @@ integer_param("sched_credit2_migrate_resist", > opt_migrate_resist); > #define c2r(_ops, _cpu) (CSCHED2_PRIV(_ops)->runq_map[(_cpu)]) > /* CPU to runqueue struct macro */ > #define RQD(_ops, _cpu) (&CSCHED2_PRIV(_ops)->rqd[c2r(_ops, _cpu)]) > +#define VCPU2ONLINE(_v) cpupool_online_cpumask((_v)->domain->cpupool) > > /* > * Shifts for load average. > @@ -194,6 +195,12 @@ int opt_overload_balance_tolerance=-3; > integer_param("credit2_balance_over", opt_overload_balance_tolerance); > > /* > + * Use this to avoid having too many cpumask_t structs on the stack > + */ > +static cpumask_t **scratch_mask = NULL; > +#define csched2_cpumask scratch_mask[smp_processor_id()] Since I'm asking for changes below: It's not clear to me when I'm scanning the code that csched2_cpumask is a scratch variable. What about calling the actual variable _scratch_cpumask and havind the #define be scratch_cpumask? > /* Find the runqueue with the lowest instantaneous load */ > for_each_cpu(i, &prv->active_queues) > { > struct csched2_runqueue_data *rqd; > - s_time_t rqd_avgload; > + s_time_t rqd_avgload = MAX_LOAD; > > rqd = prv->rqd + i; > > /* If checking a different runqueue, grab the lock, > - * read the avg, and then release the lock. > + * check hard affinity, read the avg, and then release the lock. > * > * If on our own runqueue, don't grab or release the lock; > * but subtract our own load from the runqueue load to simulate > - * impartiality */ > + * impartiality. > + * > + * svc's hard affinity may have changed; this function is the > + * credit 2 scheduler's first opportunity to react to the change, > + * so it is possible here that svc does not have hard affinity > + * with any of the pcpus of svc's currently assigned run queue. > + */ > if ( rqd == svc->rqd ) > { > - rqd_avgload = rqd->b_avgload - svc->avgload; > + if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) ) > + rqd_avgload = rqd->b_avgload - svc->avgload; > } > else if ( spin_trylock(&rqd->lock) ) > { > - rqd_avgload = rqd->b_avgload; > + if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) ) > + rqd_avgload = rqd->b_avgload; > + > spin_unlock(&rqd->lock); > } > else Since we're already potentially falling through and doing the comparison below with an unchanged rqd_avgload (which has now been set to MAX_LOAD above), I wonder if it makes more sense to remove this "else continue" here, just to avoid confusing people. > @@ -2024,6 +2096,13 @@ csched2_alloc_pdata(const struct scheduler *ops, int > cpu) > printk("%s: cpu %d not online yet, deferring initializatgion\n", > __func__, cpu); > > + /* > + * For each new pcpu, allocate a cpumask_t for use throughout the > + * scheduler to avoid putting any cpumask_t structs on the stack. > + */ > + if ( !zalloc_cpumask_var(&scratch_mask[cpu]) ) Any reason not to use "scratch_mask + cpu" here rather than "&scratch_mask[cpu]"? It might not be a bad idea to ad ASSERT(scratch_mask[cpu] == NULL) before this, just to be paranoid... > + return NULL; > + > return (void *)1; > } > > @@ -2072,6 +2151,8 @@ csched2_free_pdata(const struct scheduler *ops, void > *pcpu, int cpu) > > spin_unlock_irqrestore(&prv->lock, flags); > > + free_cpumask_var(scratch_mask[cpu]); We should definitely set this to NULL after freeing it (see below) > + > return; > } > > @@ -2159,6 +2240,10 @@ csched2_init(struct scheduler *ops) > > prv->load_window_shift = opt_load_window_shift; > > + scratch_mask = xmalloc_array(cpumask_t *, nr_cpu_ids); I realize Dario recommended using xmalloc_array() instead of xzalloc_array(), but I don't understand why he thinks that's OK. His mail says "(see below about why I actually don't think we need)", but I don't actually see that addressed in that e-mail. I think it's just dangerous to leave uninitialized pointers around. The invariant should be that if the array entry is invalid it's NULL, and if it's non-null then it's valid. (* marc.info/?i=<1426266674.32500.25.camel@xxxxxxxxxx>) Also -- I think this allocation wants to happen in global_init(), right? Otherwise if you make a second cpupool with the credit2 scheduler this will be clobbered. (I think nr_cpu_ids should be defined at that point.) -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.