[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/3] xen: Have schedulers revise initial placement



On Thu, 2016-08-11 at 16:51 +0100, Andrew Cooper wrote:
> On 11/08/16 15:59, Dario Faggioli wrote:
>
> > Which, I think needs at least this hunk (from 6b53bb4ab3c9  "sched:
> > better handle (not) inserting idle vCPUs in runqueues"):
> > 
> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index 2beebe8..fddcd52 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -240,20 +240,22 @@ int sched_init_vcpu(struct vcpu *v, unsigned
> > int processor)
> >      init_timer(&v->poll_timer, poll_timer_fn,
> >                 v, v->processor);
> >  
> > -    /* Idle VCPUs are scheduled immediately. */
> > +    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d-
> > >sched_priv);
> > +    if ( v->sched_priv == NULL )
> > +        return 1;
> > +
> > +    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
> > +
> > +    /* Idle VCPUs are scheduled immediately, so don't put them in
> > runqueue. */
> >      if ( is_idle_domain(d) )
> >      {
> >          per_cpu(schedule_data, v->processor).curr = v;
> >          v->is_running = 1;
> >      }
> > -
> > -    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
> > -
> > -    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d-
> > >sched_priv);
> > -    if ( v->sched_priv == NULL )
> > -        return 1;
> > -
> > -    SCHED_OP(DOM2OP(d), insert_vcpu, v);
> > +    else
> > +    {
> > +        SCHED_OP(DOM2OP(d), insert_vcpu, v);
> > +    }
> >  
> >      return 0;
> >  }
> > 
> > So, yeah, it's proving a little more complicated than how I thought
> > it
> > would have, just by looking at the patches. :-/
> > 
> > Will let know.
> FWIW, this looks very similar to the regression I just raised against
> Xen 4.7 "[Xen-devel] Scheduler regression in 4.7".  The stack traces
> are
> suspiciously similar.  
>
I thought the same at the beginning, but they actually may not be the
same or even related.

This happens at early boot, and reason is we try to call
csched_cpu_pick() on the idle vcpus, which does not make any sense, and
in fact one of the ASSERTS triggers.

In your case, system has booted fine already. And the reason for that
is you're looking at 4.7, and 4.7 is no longer calling insert_vcpu(),
which then calls csched_cpu_pick(), on idle vcpus at boot, thanks to
the patch I'm mentioning above.

And in fact, I confirm that, on 4.6, with "just" the hunk above of said
patch, I can boot, create a (large) VM, play a bit with it, shutdown or
reboot it, and shutdown the host as well.

Also, yours seems to _explode_ because of a race on the runq (in
IS_RUNQ_IDLE()), this one _asserts_ here:

        /* Pick an online CPU from the proper affinity mask */
        csched_balance_cpumask(vc, balance_step, &cpus);

        cpumask_and(&cpus, &cpus, online);
        /* If present, prefer vc's current processor */
        cpu = cpumask_test_cpu(vc->processor, &cpus)
                ? vc->processor
                : cpumask_cycle(vc->processor, &cpus);
        ASSERT(cpumask_test_cpu(cpu, &cpus));

Because, as I said, we're on early boot, and most likely, there's
almost no one in online!

> I expect they have the same root cause.
> 
No, I think they're two different things.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.