|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 04/11] xen: sched: close potential races when switching scheduler to CPUs
On 06/04/16 18:23, Dario Faggioli wrote:
> In short, the point is making sure that the actual switch
> of scheduler and the remapping of the scheduler's runqueue
> lock occur in the same critical section, protected by the
> "old" scheduler's lock (and not, e.g., in the free_pdata
> hook, as it is now for Credit2 and RTDS).
>
> Not doing so, is (at least) racy. In fact, for instance,
> if we switch cpu X from, Credit2 to Credit, we do:
>
> schedule_cpu_switch(x, csched2 --> csched):
> //scheduler[x] is csched2
> //schedule_lock[x] is csched2_lock
> csched_alloc_pdata(x)
> csched_init_pdata(x)
> pcpu_schedule_lock(x) ----> takes csched2_lock
> scheduler[X] = csched
> pcpu_schedule_unlock(x) --> unlocks csched2_lock
> [1]
> csched2_free_pdata(x)
> pcpu_schedule_lock(x) --> takes csched2_lock
> schedule_lock[x] = csched_lock
> spin_unlock(csched2_lock)
>
> While, if we switch cpu X from, Credit to Credit2, we do:
>
> schedule_cpu_switch(X, csched --> csched2):
> //scheduler[x] is csched
> //schedule_lock[x] is csched_lock
> csched2_alloc_pdata(x)
> csched2_init_pdata(x)
> pcpu_schedule_lock(x) --> takes csched_lock
> schedule_lock[x] = csched2_lock
> spin_unlock(csched_lock)
> [2]
> pcpu_schedule_lock(x) ----> takes csched2_lock
> scheduler[X] = csched2
> pcpu_schedule_unlock(x) --> unlocks csched2_lock
> csched_free_pdata(x)
>
> And if we switch cpu X from RTDS to Credit2, we do:
>
> schedule_cpu_switch(X, RTDS --> csched2):
> //scheduler[x] is rtds
> //schedule_lock[x] is rtds_lock
> csched2_alloc_pdata(x)
> csched2_init_pdata(x)
> pcpu_schedule_lock(x) --> takes rtds_lock
> schedule_lock[x] = csched2_lock
> spin_unlock(rtds_lock)
> pcpu_schedule_lock(x) ----> takes csched2_lock
> scheduler[x] = csched2
> pcpu_schedule_unlock(x) --> unlocks csched2_lock
> rtds_free_pdata(x)
> spin_lock(rtds_lock)
> ASSERT(schedule_lock[x] == rtds_lock) [3]
> schedule_lock[x] = DEFAULT_SCHEDULE_LOCK [4]
> spin_unlock(rtds_lock)
>
> So, the first problem is that, if anything related to
> scheduling, and involving CPU, happens at [1] or [2], we:
> - take csched2_lock,
> - operate on Credit1 functions and data structures,
> which is no good!
>
> The second problem is that the ASSERT at [3] triggers, and
> the third that at [4], we screw up the lock remapping we've
> done for ourself in csched2_init_pdata()!
>
> The first problem arises because there is a window during
> which the lock is already the new one, but the scheduler is
> still the old one. The other two, becase we let schedulers
> mess with the lock (re)mapping done by others.
>
> This patch, therefore, introduces a new hook in the scheduler
> interface, called switch_sched, meant at being used when
> switching scheduler on a CPU, and implements it for the
> various schedulers (that needs it: i.e., all except ARINC653),
> so that things are done in the proper order and under the
> protection of the best suited (set of) lock(s). It is
> necessary to add the hook (as compared to keep doing things
> in generic code), because different schedulers may have
> different locking schemes.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
Hey Dario! Everything here looks good, except for one thing: the
scheduler lock for arinc653 scheduler. :-) What happens now if you
assign a cpu to credit2, and then assign it to arinc653? Since arinc
doesn't implement the switch_sched() functionality, the per-cpu
scheduler lock will still point to the credit2 lock, won't it?
Which will *work*, although it will add unnecessary contention to the
credit2 lock; until the lock goes away, at which point
vcpu_schedule_lock*() will essentially be using a wild pointer.
-George
> ---
> Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> Cc: Meng Xu <mengxu@xxxxxxxxxxxxx>
> Cc: Tianyang Chen <tiche@xxxxxxxxxxxxxx>
> ---
> Changes from v1:
>
> new patch, basically, coming from squashing what were
> 4 patches in v1. In any case, with respect to those 4
> patches:
> - runqueue lock is back being taken in schedule_cpu_switch(),
> as suggested during review;
> - add barriers for making sure all initialization is done
> when the new lock is assigned, as sugested during review;
> - add comments and ASSERT-s about how and why the adopted
> locking scheme is safe, as suggested during review.
> ---
> xen/common/sched_credit.c | 44 ++++++++++++++++++++++++
> xen/common/sched_credit2.c | 81
> +++++++++++++++++++++++++++++++++-----------
> xen/common/sched_rt.c | 45 +++++++++++++++++-------
> xen/common/schedule.c | 41 +++++++++++++++++-----
> xen/include/xen/sched-if.h | 3 ++
> 5 files changed, 172 insertions(+), 42 deletions(-)
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 96a245d..540d515 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -578,12 +578,55 @@ csched_init_pdata(const struct scheduler *ops, void
> *pdata, int cpu)
> {
> unsigned long flags;
> struct csched_private *prv = CSCHED_PRIV(ops);
> + struct schedule_data *sd = &per_cpu(schedule_data, cpu);
> +
> + /*
> + * This is called either during during boot, resume or hotplug, in
> + * case Credit1 is the scheduler chosen at boot. In such cases, the
> + * scheduler lock for cpu is already pointing to the default per-cpu
> + * spinlock, as Credit1 needs it, so there is no remapping to be done.
> + */
> + ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
>
> spin_lock_irqsave(&prv->lock, flags);
> init_pdata(prv, pdata, cpu);
> spin_unlock_irqrestore(&prv->lock, flags);
> }
>
> +/* Change the scheduler of cpu to us (Credit). */
> +static void
> +csched_switch_sched(struct scheduler *ops, unsigned int cpu,
> + void *pdata, void *vdata)
> +{
> + struct schedule_data *sd = &per_cpu(schedule_data, cpu);
> + struct csched_private *prv = CSCHED_PRIV(ops);
> + struct csched_vcpu *svc = vdata;
> +
> + ASSERT(svc && is_idle_vcpu(svc->vcpu));
> +
> + idle_vcpu[cpu]->sched_priv = vdata;
> +
> + /*
> + * We are holding the runqueue lock already (it's been taken in
> + * schedule_cpu_switch()). It actually may or may not be the 'right'
> + * one for this cpu, but that is ok for preventing races.
> + */
> + spin_lock(&prv->lock);
> + init_pdata(prv, pdata, cpu);
> + spin_unlock(&prv->lock);
> +
> + per_cpu(scheduler, cpu) = ops;
> + per_cpu(schedule_data, cpu).sched_priv = pdata;
> +
> + /*
> + * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
> + * if it is free (and it can be) we want that anyone that manages
> + * taking it, finds all the initializations we've done above in place.
> + */
> + smp_mb();
> + sd->schedule_lock = &sd->_lock;
> +}
> +
> #ifndef NDEBUG
> static inline void
> __csched_vcpu_check(struct vcpu *vc)
> @@ -2067,6 +2110,7 @@ static const struct scheduler sched_credit_def = {
> .alloc_pdata = csched_alloc_pdata,
> .init_pdata = csched_init_pdata,
> .free_pdata = csched_free_pdata,
> + .switch_sched = csched_switch_sched,
> .alloc_domdata = csched_alloc_domdata,
> .free_domdata = csched_free_domdata,
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 8989eea..60c6f5b 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1971,12 +1971,12 @@ static void deactivate_runqueue(struct
> csched2_private *prv, int rqi)
> cpumask_clear_cpu(rqi, &prv->active_queues);
> }
>
> -static void
> +/* Returns the ID of the runqueue the cpu is assigned to. */
> +static unsigned
> init_pdata(struct csched2_private *prv, unsigned int cpu)
> {
> unsigned rqi;
> struct csched2_runqueue_data *rqd;
> - spinlock_t *old_lock;
>
> ASSERT(spin_is_locked(&prv->lock));
> ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
> @@ -2007,44 +2007,89 @@ init_pdata(struct csched2_private *prv, unsigned int
> cpu)
> activate_runqueue(prv, rqi);
> }
>
> - /* IRQs already disabled */
> - old_lock = pcpu_schedule_lock(cpu);
> -
> - /* Move spinlock to new runq lock. */
> - per_cpu(schedule_data, cpu).schedule_lock = &rqd->lock;
> -
> /* Set the runqueue map */
> prv->runq_map[cpu] = rqi;
>
> cpumask_set_cpu(cpu, &rqd->idle);
> cpumask_set_cpu(cpu, &rqd->active);
> -
> - /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
> - spin_unlock(old_lock);
> -
> cpumask_set_cpu(cpu, &prv->initialized);
>
> - return;
> + return rqi;
> }
>
> static void
> csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
> {
> struct csched2_private *prv = CSCHED2_PRIV(ops);
> + spinlock_t *old_lock;
> unsigned long flags;
> + unsigned rqi;
>
> spin_lock_irqsave(&prv->lock, flags);
> - init_pdata(prv, cpu);
> + old_lock = pcpu_schedule_lock(cpu);
> +
> + rqi = init_pdata(prv, cpu);
> + /* Move the scheduler lock to the new runq lock. */
> + per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
> +
> + /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
> + spin_unlock(old_lock);
> spin_unlock_irqrestore(&prv->lock, flags);
> }
>
> +/* Change the scheduler of cpu to us (Credit2). */
> +static void
> +csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
> + void *pdata, void *vdata)
> +{
> + struct csched2_private *prv = CSCHED2_PRIV(new_ops);
> + struct csched2_vcpu *svc = vdata;
> + unsigned rqi;
> +
> + ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
> +
> + /*
> + * We own one runqueue lock already (from schedule_cpu_switch()). This
> + * looks like it violates this scheduler's locking rules, but it does
> + * not, as what we own is the lock of another scheduler, that hence has
> + * no particular (ordering) relationship with our private global lock.
> + * And owning exactly that one (the lock of the old scheduler of this
> + * cpu) is what is necessary to prevent races.
> + */
> + spin_lock_irq(&prv->lock);
> +
> + idle_vcpu[cpu]->sched_priv = vdata;
> +
> + rqi = init_pdata(prv, cpu);
> +
> + /*
> + * Now that we know what runqueue we'll go in, double check what's said
> + * above: the lock we already hold is not the one of this runqueue of
> + * this scheduler, and so it's safe to have taken it /before/ our
> + * private global lock.
> + */
> + ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->rqd[rqi].lock);
> +
> + per_cpu(scheduler, cpu) = new_ops;
> + per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
> +
> + /*
> + * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
> + * if it is free (and it can be) we want that anyone that manages
> + * taking it, find all the initializations we've done above in place.
> + */
> + smp_mb();
> + per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
> +
> + spin_unlock_irq(&prv->lock);
> +}
> +
> static void
> csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> {
> unsigned long flags;
> struct csched2_private *prv = CSCHED2_PRIV(ops);
> struct csched2_runqueue_data *rqd;
> - struct schedule_data *sd = &per_cpu(schedule_data, cpu);
> int rqi;
>
> spin_lock_irqsave(&prv->lock, flags);
> @@ -2072,11 +2117,6 @@ csched2_free_pdata(const struct scheduler *ops, void
> *pcpu, int cpu)
> deactivate_runqueue(prv, rqi);
> }
>
> - /* Move spinlock to the original lock. */
> - ASSERT(sd->schedule_lock == &rqd->lock);
> - ASSERT(!spin_is_locked(&sd->_lock));
> - sd->schedule_lock = &sd->_lock;
> -
> spin_unlock(&rqd->lock);
>
> cpumask_clear_cpu(cpu, &prv->initialized);
> @@ -2170,6 +2210,7 @@ static const struct scheduler sched_credit2_def = {
> .free_vdata = csched2_free_vdata,
> .init_pdata = csched2_init_pdata,
> .free_pdata = csched2_free_pdata,
> + .switch_sched = csched2_switch_sched,
> .alloc_domdata = csched2_alloc_domdata,
> .free_domdata = csched2_free_domdata,
> };
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index b96bd93..3bb8c71 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -682,6 +682,37 @@ rt_init_pdata(const struct scheduler *ops, void *pdata,
> int cpu)
> spin_unlock_irqrestore(old_lock, flags);
> }
>
> +/* Change the scheduler of cpu to us (RTDS). */
> +static void
> +rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
> + void *pdata, void *vdata)
> +{
> + struct rt_private *prv = rt_priv(new_ops);
> + struct rt_vcpu *svc = vdata;
> +
> + ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
> +
> + /*
> + * We are holding the runqueue lock already (it's been taken in
> + * schedule_cpu_switch()). It's actually the runqueue lock of
> + * another scheduler, but that is how things need to be, for
> + * preventing races.
> + */
> + ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->lock);
> +
> + idle_vcpu[cpu]->sched_priv = vdata;
> + per_cpu(scheduler, cpu) = new_ops;
> + per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
> +
> + /*
> + * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
> + * if it is free (and it can be) we want that anyone that manages
> + * taking it, find all the initializations we've done above in place.
> + */
> + smp_mb();
> + per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
> +}
> +
> static void *
> rt_alloc_pdata(const struct scheduler *ops, int cpu)
> {
> @@ -707,19 +738,6 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
> static void
> rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> {
> - struct rt_private *prv = rt_priv(ops);
> - struct schedule_data *sd = &per_cpu(schedule_data, cpu);
> - unsigned long flags;
> -
> - spin_lock_irqsave(&prv->lock, flags);
> -
> - /* Move spinlock back to the default lock */
> - ASSERT(sd->schedule_lock == &prv->lock);
> - ASSERT(!spin_is_locked(&sd->_lock));
> - sd->schedule_lock = &sd->_lock;
> -
> - spin_unlock_irqrestore(&prv->lock, flags);
> -
> free_cpumask_var(_cpumask_scratch[cpu]);
> }
>
> @@ -1468,6 +1486,7 @@ static const struct scheduler sched_rtds_def = {
> .alloc_pdata = rt_alloc_pdata,
> .free_pdata = rt_free_pdata,
> .init_pdata = rt_init_pdata,
> + .switch_sched = rt_switch_sched,
> .alloc_domdata = rt_alloc_domdata,
> .free_domdata = rt_free_domdata,
> .init_domain = rt_dom_init,
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 1941613..5559aa1 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1635,11 +1635,11 @@ void __init scheduler_init(void)
> int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
> {
> struct vcpu *idle;
> - spinlock_t *lock;
> void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
> struct scheduler *old_ops = per_cpu(scheduler, cpu);
> struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
> struct cpupool *old_pool = per_cpu(cpupool, cpu);
> + spinlock_t * old_lock;
>
> /*
> * pCPUs only move from a valid cpupool to free (i.e., out of any pool),
> @@ -1658,11 +1658,21 @@ int schedule_cpu_switch(unsigned int cpu, struct
> cpupool *c)
> if ( old_ops == new_ops )
> goto out;
>
> + /*
> + * To setup the cpu for the new scheduler we need:
> + * - a valid instance of per-CPU scheduler specific data, as it is
> + * allocated by SCHED_OP(alloc_pdata). Note that we do not want to
> + * initialize it yet (i.e., we are not calling SCHED_OP(init_pdata)).
> + * That will be done by the target scheduler, in
> SCHED_OP(switch_sched),
> + * in proper ordering and with locking.
> + * - a valid instance of per-vCPU scheduler specific data, for the idle
> + * vCPU of cpu. That is what the target scheduler will use for the
> + * sched_priv field of the per-vCPU info of the idle domain.
> + */
> idle = idle_vcpu[cpu];
> ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
> if ( IS_ERR(ppriv) )
> return PTR_ERR(ppriv);
> - SCHED_OP(new_ops, init_pdata, ppriv, cpu);
> vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
> if ( vpriv == NULL )
> {
> @@ -1670,17 +1680,30 @@ int schedule_cpu_switch(unsigned int cpu, struct
> cpupool *c)
> return -ENOMEM;
> }
>
> - lock = pcpu_schedule_lock_irq(cpu);
> -
> SCHED_OP(old_ops, tick_suspend, cpu);
> +
> + /*
> + * The actual switch, including (if necessary) the rerouting of the
> + * scheduler lock to whatever new_ops prefers, needs to happen in one
> + * critical section, protected by old_ops' lock, or races are possible.
> + * It is, in fact, the lock of another scheduler that we are taking (the
> + * scheduler of the cpupool that cpu still belongs to). But that is ok
> + * as, anyone trying to schedule on this cpu will spin until when we
> + * release that lock (bottom of this function). When he'll get the lock
> + * --thanks to the loop inside *_schedule_lock() functions-- he'll notice
> + * that the lock itself changed, and retry acquiring the new one (which
> + * will be the correct, remapped one, at that point).
> + */
> + old_lock = pcpu_schedule_lock(cpu);
> +
> vpriv_old = idle->sched_priv;
> - idle->sched_priv = vpriv;
> - per_cpu(scheduler, cpu) = new_ops;
> ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
> - per_cpu(schedule_data, cpu).sched_priv = ppriv;
> - SCHED_OP(new_ops, tick_resume, cpu);
> + SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);
>
> - pcpu_schedule_unlock_irq(lock, cpu);
> + /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
> + spin_unlock_irq(old_lock);
> +
> + SCHED_OP(new_ops, tick_resume, cpu);
>
> SCHED_OP(old_ops, free_vdata, vpriv_old);
> SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> index 70c08c6..9cebe41 100644
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -137,6 +137,9 @@ struct scheduler {
> void (*free_domdata) (const struct scheduler *, void *);
> void * (*alloc_domdata) (const struct scheduler *, struct domain
> *);
>
> + void (*switch_sched) (struct scheduler *, unsigned int,
> + void *, void *);
> +
> int (*init_domain) (const struct scheduler *, struct domain
> *);
> void (*destroy_domain) (const struct scheduler *, struct domain
> *);
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |