|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> The function is called both when we want to remove a cpu
> from a cpupool, and during cpu teardown, for suspend or
> shutdown. If, however, the boot cpu (cpu 0, most of the
> times) is not present in the default cpupool, during
> suspend or shutdown, Xen crashes like this:
>
> root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
> root@Zhaman:~# shutdown -h now
> (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]----
> ...
> (XEN) Xen call trace:
> (XEN) [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f
> (XEN) [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10
> (XEN) [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321
> (XEN) [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac
> (XEN) [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
> (XEN) [<ffff82d080101424>] take_cpu_down+0x34/0x3b
> (XEN) [<ffff82d08013097a>] stopmachine_action+0x70/0x99
> (XEN) [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
> (XEN) [<ffff82d080132926>] do_tasklet+0x5e/0x8a
> (XEN) [<ffff82d08016478c>] idle_loop+0x56/0x6b
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 15:
> (XEN) Assertion 'cpu < nr_cpu_ids' failed at
> ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
> (XEN) ****************************************
>
> There also are problems when we try to suspend or shutdown
> with a cpupool configured with just one cpu (no matter, in
> this case, whether that is the boot cpu or not):
>
> root@Zhaman:~# xl create /etc/xen/test.cfg
> root@Zhaman:~# xl cpupool-migrate test Pool-1
> root@Zhaman:~# xl cpupool-list -c
> Name CPU list
> Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
> Pool-1 12
> root@Zhaman:~# shutdown -h now
> (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]----
> (XEN) CPU: 12
> ...
> (XEN) Xen call trace:
> (XEN) [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
> (XEN) [<ffff82d080101424>] take_cpu_down+0x34/0x3b
> (XEN) [<ffff82d08013097a>] stopmachine_action+0x70/0x99
> (XEN) [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
> (XEN) [<ffff82d080132926>] do_tasklet+0x5e/0x8a
> (XEN) [<ffff82d08016478c>] idle_loop+0x56/0x6b
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 12:
> (XEN) Xen BUG at smpboot.c:895
> (XEN) ****************************************
>
> In both cases, the problem is the scheduler not being able
> to:
> - move all the vcpus to the boot cpu (as the boot cpu is
> not in the cpupool), in the former;
> - move the vcpus away from a cpu at all (as that is the
> only one cpu in the cpupool), in the latter.
>
> Solution is to distinguish, inside cpu_disable_scheduler(),
> the two cases of cpupool manipulation and teardown. For
> cpupool manipulation, it is correct to ask the scheduler to
> take an action, as pathological situation (like there not
> being any cpu in the pool where to send vcpus) are taken
> care of (i.e., forbidden!) already. For suspend and shutdown,
> we don't want the scheduler to be involved at all, as the
> final goal is pretty simple: "send all the vcpus to the
> boot cpu ASAP", so we just go for it.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> ---
> Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> Cc: Juergen Gross <jgross@xxxxxxxx>
> ---
> xen/common/schedule.c | 109
> ++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 93 insertions(+), 16 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e83c666..eac8804 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v)
> * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
> * CPUs needs to have been taken already when calling this!
> */
> -static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
> - unsigned int new_cpu)
> +static void _vcpu_move(struct vcpu *v, unsigned int old_cpu,
> + unsigned int new_cpu)
> {
> /*
> * Transfer urgency status to new CPU before switching CPUs, as
> @@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int
> old_cpu,
> v->processor = new_cpu;
> }
>
> +static void vcpu_move(struct vcpu *v, unsigned int new_cpu)
Not quite a fan of the naming scheme here. Perhaps vcpu_move and
vcpu_move_locked?
Actually -- looking at this again, is there a reason to pass old_cpu
into _vcpu_move? It looks like old_vcpu should always be equal to
v->processor. That would make the function prototypes the same.
> +{
> + unsigned long flags;
> + unsigned int cpu = v->processor;
> + spinlock_t *lock, *new_lock;
> +
> + /*
> + * When here, the vcpu should be ready for being moved. This means:
> + * - both its original and target processor must be quiet;
> + * - it must not be marked as currently running;
> + * - the proper flag must be set (i.e., no one must have had any
> + * chance to reset it).
> + */
> + ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) &&
> + is_idle_vcpu(curr_on_cpu(new_cpu)));
> + ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags));
> +
> + lock = per_cpu(schedule_data, v->processor).schedule_lock;
> + new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
> +
> + sched_spin_lock_double(lock, new_lock, &flags);
> + ASSERT(new_cpu != v->processor);
Don't we need to do the "schedule lock dance" here as well --
double-check to make sure that v->processor hasn't changed since the
time we grabbed the lock?
> + _vcpu_move(v, cpu, new_cpu);
> + sched_spin_unlock_double(lock, new_lock, flags);
> +
> + sched_move_irqs(v);
> + vcpu_wake(v);
> +}
> +
> static void vcpu_migrate(struct vcpu *v)
> {
> unsigned long flags;
> @@ -543,7 +572,7 @@ static void vcpu_migrate(struct vcpu *v)
> return;
> }
>
> - vcpu_move(v, old_cpu, new_cpu);
> + _vcpu_move(v, old_cpu, new_cpu);
>
> sched_spin_unlock_double(old_lock, new_lock, flags);
>
> @@ -616,7 +645,8 @@ int cpu_disable_scheduler(unsigned int cpu)
> struct vcpu *v;
> struct cpupool *c;
> cpumask_t online_affinity;
> - int ret = 0;
> + unsigned int new_cpu;
> + int ret = 0;
>
> c = per_cpu(cpupool, cpu);
> if ( c == NULL )
> @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
> cpumask_setall(v->cpu_hard_affinity);
> }
>
> - if ( v->processor == cpu )
> + if ( v->processor != cpu )
> {
> - set_bit(_VPF_migrating, &v->pause_flags);
> + /* This vcpu is not on cpu, so we can move on. */
> vcpu_schedule_unlock_irqrestore(lock, flags, v);
> - vcpu_sleep_nosync(v);
> - vcpu_migrate(v);
> + continue;
> }
> - else
> - vcpu_schedule_unlock_irqrestore(lock, flags, v);
>
> /*
> - * A vcpu active in the hypervisor will not be migratable.
> - * The caller should try again after releasing and reaquiring
> - * all locks.
> + * If we're here, it means that the vcpu is on cpu. Let's see how
> + * it's best to send it away, depending on whether we are
> shutting
> + * down/suspending, or doing cpupool manipulations.
> */
> - if ( v->processor == cpu )
> - ret = -EAGAIN;
> - }
> + set_bit(_VPF_migrating, &v->pause_flags);
> + vcpu_schedule_unlock_irqrestore(lock, flags, v);
> + vcpu_sleep_nosync(v);
I don't quite understand this. By calling _nosync(), you're not
guaranteed that the vcpu has actually stopped running when you call
vcpu_move() down below; and yet inside vcpu_move(), you assert
!v->is_running.
So either at this point, when doing suspend, the vcpu has already been
paused; in which case this is unnecessary; or it hasn't been paused,
in which case we're potentially racing the IPI / deschedule, and will
trip over the ASSERT if we "win".
Did I miss something? (I'm perfectly ready to believe that I have...)
-George
> +
> + /*
> + * In case of shutdown/suspend, it is not necessary to ask the
> + * scheduler to chime in. In fact:
> + * * there is no reason for it: the end result we are after is
> + * just 'all the vcpus on the boot pcpu, and no vcpu anywhere
> + * else', so let's just go for it;
> + * * it's wrong, when dealing a cpupool with only non-boot
> pcpus,
> + * as the scheduler will always fail to send the vcpus away
> + * from the last online (non boot) pcpu!
> + *
> + * Therefore, in the shutdown/suspend case, let's just pick one
> + * of the still online pcpus, and send everyone there. Ideally,
> + * we would pick up the boot pcpu directly, but we don't know
> + * which one it is.
> + *
> + * OTOH, if the system is still live, and we are here because of
> + * cpupool manipulations:
> + * * it would be best to call the scheduler, as that would serve
> + * as a re-evaluation of the placement of the vcpu, taking
> into
> + * account the modified cpupool configuration;
> + * * the scheduler will always fine a suitable solution, or
> + * things would have failed before getting in here.
> + *
> + * Therefore, in the cpupool manipulation case, let's just ask
> the
> + * scheduler to do its job, via calling vcpu_migrate().
> + */
> + if ( unlikely(system_state == SYS_STATE_suspend) )
> + {
> + /*
> + * The boot pcpu is, usually, pcpu #0, so using
> cpumask_first()
> + * actually helps us to achieve our ultimate goal quicker.
> + */
> + cpumask_andnot(&online_affinity, &cpu_online_map,
> cpumask_of(cpu));
> + new_cpu = cpumask_first(&online_affinity);
> + vcpu_move(v, new_cpu);
> + }
> + else
> + {
> + /*
> + * The only caveat, in this case, is that if the vcpu active
> + * in the hypervisor, it won't be migratable. In this case,
> + * the caller should try again after releasing and reaquiring
> + * all locks.
> + */
> + vcpu_migrate(v);
>
> + if ( v->processor == cpu )
> + ret = -EAGAIN;
> + }
> + }
> domain_update_node_affinity(d);
> }
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |