[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
- To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
- From: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
- Date: Thu, 20 Dec 2012 20:21:45 +0000
- Cc: Marcus Granado <Marcus.Granado@xxxxxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Anil Madhavapeddy <anil@xxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>
- Delivery-date: Thu, 20 Dec 2012 20:28:41 +0000
- List-id: Xen developer discussion <xen-devel.lists.xen.org>
On 19/12/12 19:07, Dario Faggioli wrote:
+static inline int
+__csched_vcpu_should_migrate(int cpu, cpumask_t *mask, cpumask_t *idlers)
+{
+ /*
+ * Consent to migration if cpu is one of the idlers in the VCPU's
+ * affinity mask. In fact, if that is not the case, it just means it
+ * was some other CPU that was tickled and should hence come and pick
+ * VCPU up. Migrating it to cpu would only make things worse.
+ */
+ return cpumask_test_cpu(cpu, idlers) && cpumask_test_cpu(cpu, mask);
}
static int
@@ -493,85 +599,98 @@ static int
cpumask_t idlers;
cpumask_t *online;
struct csched_pcpu *spc = NULL;
+ int ret, balance_step;
int cpu;
- /*
- * Pick from online CPUs in VCPU's affinity mask, giving a
- * preference to its current processor if it's in there.
- */
online = cpupool_scheduler_cpumask(vc->domain->cpupool);
- cpumask_and(&cpus, online, vc->cpu_affinity);
- cpu = cpumask_test_cpu(vc->processor, &cpus)
- ? vc->processor
- : cpumask_cycle(vc->processor, &cpus);
- ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
+ for_each_csched_balance_step( balance_step )
+ {
+ /* Pick an online CPU from the proper affinity mask */
+ ret = csched_balance_cpumask(vc, balance_step, &cpus);
+ cpumask_and(&cpus, &cpus, online);
- /*
- * Try to find an idle processor within the above constraints.
- *
- * In multi-core and multi-threaded CPUs, not all idle execution
- * vehicles are equal!
- *
- * We give preference to the idle execution vehicle with the most
- * idling neighbours in its grouping. This distributes work across
- * distinct cores first and guarantees we don't do something stupid
- * like run two VCPUs on co-hyperthreads while there are idle cores
- * or sockets.
- *
- * Notice that, when computing the "idleness" of cpu, we may want to
- * discount vc. That is, iff vc is the currently running and the only
- * runnable vcpu on cpu, we add cpu to the idlers.
- */
- cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
- if ( vc->processor == cpu && IS_RUNQ_IDLE(cpu) )
- cpumask_set_cpu(cpu, &idlers);
- cpumask_and(&cpus, &cpus, &idlers);
- cpumask_clear_cpu(cpu, &cpus);
+ /* If present, prefer vc's current processor */
+ cpu = cpumask_test_cpu(vc->processor, &cpus)
+ ? vc->processor
+ : cpumask_cycle(vc->processor, &cpus);
+ ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
- while ( !cpumask_empty(&cpus) )
- {
- cpumask_t cpu_idlers;
- cpumask_t nxt_idlers;
- int nxt, weight_cpu, weight_nxt;
- int migrate_factor;
+ /*
+ * Try to find an idle processor within the above constraints.
+ *
+ * In multi-core and multi-threaded CPUs, not all idle execution
+ * vehicles are equal!
+ *
+ * We give preference to the idle execution vehicle with the most
+ * idling neighbours in its grouping. This distributes work across
+ * distinct cores first and guarantees we don't do something stupid
+ * like run two VCPUs on co-hyperthreads while there are idle cores
+ * or sockets.
+ *
+ * Notice that, when computing the "idleness" of cpu, we may want to
+ * discount vc. That is, iff vc is the currently running and the only
+ * runnable vcpu on cpu, we add cpu to the idlers.
+ */
+ cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
+ if ( vc->processor == cpu && IS_RUNQ_IDLE(cpu) )
+ cpumask_set_cpu(cpu, &idlers);
+ cpumask_and(&cpus, &cpus, &idlers);
+ /* If there are idlers and cpu is still not among them, pick one */
+ if ( !cpumask_empty(&cpus) && !cpumask_test_cpu(cpu, &cpus) )
+ cpu = cpumask_cycle(cpu, &cpus);
This seems to be an addition to the algorithm -- particularly hidden in
this kind of "indent a big section that's almost exactly the same", I
think this at least needs to be called out in the changelog message,
perhaps put in a separate patch.
Can you comment on to why you think it's necessary? Was there a
particular problem you were seeing?
+ cpumask_clear_cpu(cpu, &cpus);
- nxt = cpumask_cycle(cpu, &cpus);
+ while ( !cpumask_empty(&cpus) )
+ {
+ cpumask_t cpu_idlers;
+ cpumask_t nxt_idlers;
+ int nxt, weight_cpu, weight_nxt;
+ int migrate_factor;
- if ( cpumask_test_cpu(cpu, per_cpu(cpu_core_mask, nxt)) )
- {
- /* We're on the same socket, so check the busy-ness of threads.
- * Migrate if # of idlers is less at all */
- ASSERT( cpumask_test_cpu(nxt, per_cpu(cpu_core_mask, cpu)) );
- migrate_factor = 1;
- cpumask_and(&cpu_idlers, &idlers, per_cpu(cpu_sibling_mask, cpu));
- cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_sibling_mask, nxt));
- }
- else
- {
- /* We're on different sockets, so check the busy-ness of cores.
- * Migrate only if the other core is twice as idle */
- ASSERT( !cpumask_test_cpu(nxt, per_cpu(cpu_core_mask, cpu)) );
- migrate_factor = 2;
- cpumask_and(&cpu_idlers, &idlers, per_cpu(cpu_core_mask, cpu));
- cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_core_mask, nxt));
+ nxt = cpumask_cycle(cpu, &cpus);
+
+ if ( cpumask_test_cpu(cpu, per_cpu(cpu_core_mask, nxt)) )
+ {
+ /* We're on the same socket, so check the busy-ness of threads.
+ * Migrate if # of idlers is less at all */
+ ASSERT( cpumask_test_cpu(nxt, per_cpu(cpu_core_mask, cpu)) );
+ migrate_factor = 1;
+ cpumask_and(&cpu_idlers, &idlers, per_cpu(cpu_sibling_mask,
+ cpu));
+ cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_sibling_mask,
+ nxt));
+ }
+ else
+ {
+ /* We're on different sockets, so check the busy-ness of cores.
+ * Migrate only if the other core is twice as idle */
+ ASSERT( !cpumask_test_cpu(nxt, per_cpu(cpu_core_mask, cpu)) );
+ migrate_factor = 2;
+ cpumask_and(&cpu_idlers, &idlers, per_cpu(cpu_core_mask, cpu));
+ cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_core_mask, nxt));
+ }
+
+ weight_cpu = cpumask_weight(&cpu_idlers);
+ weight_nxt = cpumask_weight(&nxt_idlers);
+ /* smt_power_savings: consolidate work rather than spreading it */
+ if ( sched_smt_power_savings ?
+ weight_cpu > weight_nxt :
+ weight_cpu * migrate_factor < weight_nxt )
+ {
+ cpumask_and(&nxt_idlers, &cpus, &nxt_idlers);
+ spc = CSCHED_PCPU(nxt);
+ cpu = cpumask_cycle(spc->idle_bias, &nxt_idlers);
+ cpumask_andnot(&cpus, &cpus, per_cpu(cpu_sibling_mask, cpu));
+ }
+ else
+ {
+ cpumask_andnot(&cpus, &cpus, &nxt_idlers);
+ }
}
- weight_cpu = cpumask_weight(&cpu_idlers);
- weight_nxt = cpumask_weight(&nxt_idlers);
- /* smt_power_savings: consolidate work rather than spreading it */
- if ( sched_smt_power_savings ?
- weight_cpu > weight_nxt :
- weight_cpu * migrate_factor < weight_nxt )
- {
- cpumask_and(&nxt_idlers, &cpus, &nxt_idlers);
- spc = CSCHED_PCPU(nxt);
- cpu = cpumask_cycle(spc->idle_bias, &nxt_idlers);
- cpumask_andnot(&cpus, &cpus, per_cpu(cpu_sibling_mask, cpu));
- }
- else
- {
- cpumask_andnot(&cpus, &cpus, &nxt_idlers);
- }
+ /* Stop if cpu is idle (or if csched_balance_cpumask() says we can) */
+ if ( cpumask_test_cpu(cpu, &idlers) || ret )
+ break;
Right -- OK, I think everything looks good here, except the "return -1
from csched_balance_cpumask" thing. I think it would be better if we
explicitly checked cpumask_full(...->node_affinity_cpumask) and skipped
the NODE step if that's the case.
Also -- and sorry to have to ask this kind of thing, but after sorting
through the placement algorithm my head hurts -- under what
circumstances would "cpumask_test_cpu(cpu, &idlers)" be false at this
point? It seems like the only possibility would be if:
( (vc->processor was not in the original &cpus [1])
|| !IS_RUNQ_IDLE(vc->processor) )
&& (there are no idlers in the original &cpus)
Which I suppose probably matches the time when we want to move on from
looking at NODE affinity and look for CPU affinity.
[1] This could happen either if the vcpu/node affinity has changed, or
if we're currently running outside our node affinity and we're doing the
NODE step.
OK -- I think I've convinced myself that this is OK as well (apart from
the hidden check). I'll come back to look at your response to the load
balancing thing tomorrow.
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|