|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 5/7] xen: credit1: increase efficiency and scalability of load balancing.
On 06/04/17 09:16, Dario Faggioli wrote:
> During load balancing, we check the non idle pCPUs to
> see if they have runnable but not running vCPUs that
> can be stolen by and set to run on currently idle pCPUs.
>
> If a pCPU has only one running (or runnable) vCPU,
> though, we don't want to steal it from there, and
> it's therefore pointless bothering with it
> (especially considering that bothering means trying
> to take its runqueue lock!).
>
> On large systems, when load is only slightly higher
> than the number of pCPUs (i.e., there are just a few
> more active vCPUs than the number of the pCPUs), this
> may mean that:
> - we go through all the pCPUs,
> - for each one, we (try to) take its runqueue locks,
> - we figure out there's actually nothing to be stolen!
>
> To mitigate this, we introduce a counter for the number
> of runnable vCPUs on each pCPU. In fact, unless there
> re least 2 runnable vCPUs --typically, one running,
> and the others in the runqueue-- it does not make sense
> to try stealing anything.
>
> signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> ---
> Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Cc: Anshul Makkar <anshul.makkar@xxxxxxxxxx>
> ---
> Changes from v2:
> * don't count the idle vCPU as runnable. This is just cosmetic and not at all
> a
> logic or functionl change wrt v1;
> * don't count inside of __runq_remove() or __runq_insert(), but provide
> specific counting functions, and call them when appropriate. This is
> necessary
> to avoid spurious overloaded state to be reported, basically *all* the time
> that a pCPU goes through the scheduler (due to the fact that the scheduler
> calls __runq_insert() on the current vCPU);
> * get rid of the overloaded cpumask and only use the counter. I actually did
> like the cpumask solution better, but for this purpose, it was overkill.
> ---
> xen/common/sched_credit.c | 78
> ++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 72 insertions(+), 6 deletions(-)
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 49caa0a..09e3192 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -172,6 +172,7 @@ struct csched_pcpu {
> struct timer ticker;
> unsigned int tick;
> unsigned int idle_bias;
> + unsigned int nr_runnable;
> };
>
> /*
> @@ -262,9 +263,26 @@ static inline bool_t is_runq_idle(unsigned int cpu)
> }
>
> static inline void
> +inc_nr_runnable(unsigned int cpu)
> +{
> + ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
> + CSCHED_PCPU(cpu)->nr_runnable++;
> +
> +}
> +
> +static inline void
> +dec_nr_runnable(unsigned int cpu)
> +{
> + ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
> + CSCHED_PCPU(cpu)->nr_runnable--;
> + ASSERT(CSCHED_PCPU(cpu)->nr_runnable >= 0);
> +}
> +
> +static inline void
> __runq_insert(struct csched_vcpu *svc)
> {
> - const struct list_head * const runq = RUNQ(svc->vcpu->processor);
> + unsigned int cpu = svc->vcpu->processor;
> + const struct list_head * const runq = RUNQ(cpu);
> struct list_head *iter;
>
> BUG_ON( __vcpu_on_runq(svc) );
> @@ -601,6 +619,7 @@ init_pdata(struct csched_private *prv, struct csched_pcpu
> *spc, int cpu)
> /* Start off idling... */
> BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
> cpumask_set_cpu(cpu, prv->idlers);
> + spc->nr_runnable = 0;
> }
>
> static void
> @@ -1042,7 +1061,10 @@ csched_vcpu_insert(const struct scheduler *ops, struct
> vcpu *vc)
> lock = vcpu_schedule_lock_irq(vc);
>
> if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
> + {
> __runq_insert(svc);
> + inc_nr_runnable(vc->processor);
> + }
>
> vcpu_schedule_unlock_irq(lock, vc);
>
> @@ -1102,7 +1124,10 @@ csched_vcpu_sleep(const struct scheduler *ops, struct
> vcpu *vc)
> cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
> }
> else if ( __vcpu_on_runq(svc) )
> + {
> + dec_nr_runnable(cpu);
> __runq_remove(svc);
> + }
> }
>
> static void
> @@ -1163,6 +1188,7 @@ csched_vcpu_wake(const struct scheduler *ops, struct
> vcpu *vc)
>
> /* Put the VCPU on the runq and tickle CPUs */
> __runq_insert(svc);
> + inc_nr_runnable(vc->processor);
> __runq_tickle(svc);
> }
>
> @@ -1664,8 +1690,10 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int
> balance_step)
> SCHED_VCPU_STAT_CRANK(speer, migrate_q);
> SCHED_STAT_CRANK(migrate_queued);
> WARN_ON(vc->is_urgent);
> + dec_nr_runnable(peer_cpu);
> __runq_remove(speer);
> vc->processor = cpu;
> + inc_nr_runnable(cpu);
> return speer;
> }
> }
> @@ -1721,7 +1749,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
> peer_node = node;
> do
> {
> - /* Find out what the !idle are in this node */
> + /* Select the pCPUs in this node that have work we can steal. */
> cpumask_andnot(&workers, online, prv->idlers);
> cpumask_and(&workers, &workers, &node_to_cpumask(peer_node));
> __cpumask_clear_cpu(cpu, &workers);
> @@ -1731,6 +1759,40 @@ csched_load_balance(struct csched_private *prv, int
> cpu,
> goto next_node;
> do
> {
> + spinlock_t *lock;
> +
> + /*
> + * If there is only one runnable vCPU on peer_cpu, it means
> + * there's no one to be stolen in its runqueue, so skip it.
> + *
> + * Checking this without holding the lock is racy... But
> that's
> + * the whole point of this optimization!
> + *
> + * In more details:
> + * - if we race with dec_nr_runnable(), we may try to take
> the
> + * lock and call csched_runq_steal() for no reason. This is
> + * not a functional issue, and should be infrequent enough.
> + * And we can avoid that by re-checking nr_runnable after
> + * having grabbed the lock, if we want;
> + * - if we race with inc_nr_runnable(), we skip a pCPU that
> may
> + * have runnable vCPUs in its runqueue, but that's not a
> + * problem because:
> + * + if racing with csched_vcpu_insert() or
> csched_vcpu_wake(),
> + * __runq_tickle() will be called afterwords, so the vCPU
> + * won't get stuck in the runqueue for too long;
> + * + if racing with csched_runq_steal(), it may be that a
> + * vCPU that we could have picked up, stays in a runqueue
> + * until someone else tries to steal it again. But this
> is
> + * no worse than what can happen already (without this
> + * optimization), it the pCPU would schedule right after
> we
> + * have taken the lock, and hence block on it.
> + */
> + if ( CSCHED_PCPU(peer_cpu)->nr_runnable <= 1 )
> + {
> + TRACE_2D(TRC_CSCHED_STEAL_CHECK, peer_cpu, /* skipp'n */
> 0);
> + goto next_cpu;
> + }
> +
> /*
> * Get ahold of the scheduler lock for this peer CPU.
> *
> @@ -1738,14 +1800,13 @@ csched_load_balance(struct csched_private *prv, int
> cpu,
> * could cause a deadlock if the peer CPU is also load
> * balancing and trying to lock this CPU.
> */
> - spinlock_t *lock = pcpu_schedule_trylock(peer_cpu);
> + lock = pcpu_schedule_trylock(peer_cpu);
> SCHED_STAT_CRANK(steal_trylock);
> if ( !lock )
> {
> SCHED_STAT_CRANK(steal_trylock_failed);
> TRACE_2D(TRC_CSCHED_STEAL_CHECK, peer_cpu, /* skipp'n */
> 0);
> - peer_cpu = cpumask_cycle(peer_cpu, &workers);
> - continue;
> + goto next_cpu;
Wait -- does this mean that before this patch, this effectively
busy-waited until peer_cpu was actually free (since peer_cpu was never
incremented)?
I like the idea in general, but I'm not a fan of the current way of
doing the accounting -- it seems like too many special cases. Let me
have a think about this and come back to it.
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |