[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.



[Adding Dario into the cc list...]

Dario, this patch has some coding issues and some other minor issues.
I tried to point out some in this email. IMHO, the main purpose of
this patch is to confirm that  the implementation fully reflects what
you suggests in the previous version. Once we make sure the direction
we are heading on is correct, we can improve the minor issues again.
(Of course, these minor issues are not actually minor things and
should be fixed before sending out the patch. Since it has been sent
out, let's first make sure the top issue is solved. :-( )

Dagaen, you can write down the cc list in a text file so that you
won't miss anyone when you send the patch.

Dagaen, please see my comments below:

2015-06-27 12:46 GMT-07:00 Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>:
> To do this, we create a new list that holds, for each
> vcpu, the time least into the future that it may need to be
> rescheduled. The scheduler chooses the lowest time off of this
> list and waits until the specified time instead of running every
> 1 ms as it did before.

Can you summarize the design discussion with Dario on the first
version of the patch? The above information is ok, but it does not say
much about the design. As you can see in our discussion with Dario,
there exist several design choices to achieve this. Explicitly state
it here can help people understand what this patch is doing.

>
> Signed-off-by: Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>
> Signed-off-by: Meng Xu <mengxu@xxxxxxxxxxxxx>
> ---
>  xen/common/sched_rt.c |  100 
> +++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 93 insertions(+), 7 deletions(-)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 4372486..cce3446 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -4,6 +4,7 @@
>   *
>   * by Sisu Xi, 2013, Washington University in Saint Louis
>   * and Meng Xu, 2014, University of Pennsylvania
> + * and Dagaen Golomb, 2015, University of Pennsylvania
>   *
>   * based on the code of credit Scheduler
>   */
> @@ -152,6 +153,8 @@ struct rt_private {
>      struct list_head sdom;      /* list of availalbe domains, used for dump 
> */
>      struct list_head runq;      /* ordered list of runnable vcpus */
>      struct list_head depletedq; /* unordered list of depleted vcpus */
> +    struct timer replenishment_timer;
> +    unsigned NUM_CPUS;

First, it does not follow the convension of variable names. UPPER CASE
should be the macro definition and a constant, which is not the case.
You can use a more proper name, such as ncpus, here and add what it is
used for.

I saw how you use this below. I will comment more there.

>      cpumask_t tickled;          /* cpus been tickled */
>  };
>
> @@ -178,6 +181,8 @@ struct rt_vcpu {
>      unsigned flags;             /* mark __RTDS_scheduled, etc.. */
>  };
>
> +static void runq_tickle(const struct scheduler *, struct rt_vcpu *);

Not sure if declaring the function is the best approach or having
another patch to move the function forward. But anyway, we don't have
to worry about this yet right now, considering there are more
important issues to tackle for this patch.
> +
>  /*
>   * Domain
>   */
> @@ -391,13 +396,19 @@ __q_remove(struct rt_vcpu *svc)
>   * Insert svc with budget in RunQ according to EDF:
>   * vcpus with smaller deadlines go first.
>   * Insert svc without budget in DepletedQ unsorted;
> + *
> + * Returns the position, 1-indexed, of where the item was inserted.
> + * Returns a positive index if placed on runq, and a negative index (the 
> index
> + *   in the depletedq * -1) if placed on the depletedq
>   */
> -static void
> +static int
>  __runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
>  {
>      struct rt_private *prv = rt_priv(ops);
>      struct list_head *runq = rt_runq(ops);
> +    struct list_head *depletedq = rt_depletedq(ops);
>      struct list_head *iter;
> +    unsigned inserted_index = 1;
>
>      ASSERT( spin_is_locked(&prv->lock) );
>
> @@ -411,15 +422,77 @@ __runq_insert(const struct scheduler *ops, struct 
> rt_vcpu *svc)
>              struct rt_vcpu * iter_svc = __q_elem(iter);
>              if ( svc->cur_deadline <= iter_svc->cur_deadline )
>                      break;
> +            else
> +                ++inserted_index;
>           }
>          list_add_tail(&svc->q_elem, iter);
> +        return inserted_index;
>      }
>      else
>      {
> -        list_add(&svc->q_elem, &prv->depletedq);
> +        // If we are inserting into previously empty depletedq, rearm
> +        //   rearm replenish timer. It will handle further scheduling until
> +        //   the depletedq is empty again (if ever)

The format of the comment is like
/**
 *
 */
instead of //.

The coding style of this patch need to improve. I know that this
version is to make sure the implementation is in the right track but
when the patch is sent out, it is supposed to be ready to apply (as I
learned this from Dario last year). :-) So please correct the coding
style in the next version. :-)


> +        if( list_empty(depletedq) )
> +            set_timer( &prv->replenishment_timer, svc->cur_deadline );

Hmm, can you handle the set_timer outside of this function? It should
be possible and make this runq_insert() function serve for only one
purpose.

> +
> +        list_for_each(iter, depletedq)
> +        {
> +            struct rt_vcpu * iter_svc = __q_elem(iter);
> +            if ( svc->cur_deadline <= iter_svc->cur_deadline )
> +                break;
> +            else
> +                ++inserted_index;
> +         }
> +        list_add_tail(&svc->q_elem, iter);
> +        return -inserted_index;
>      }
>  }
>
> +static void replenishment_handler(void* data)
> +{
> +    const struct scheduler *ops = data;
> +    struct rt_private *prv = rt_priv(ops);
> +    struct list_head *depletedq = rt_depletedq(ops);
> +    struct list_head *iter;
> +    struct list_head *tmp;
> +    struct rt_vcpu *svc = NULL;
> +    s_time_t now = NOW();
> +    int new_position = 0;
> +    unsigned long flags;
> +
> +    spin_lock_irqsave(&prv->lock, flags);
> +
> +    // Replenish the vCPU that triggered this, and resort it into runq
> +    list_for_each_safe(iter, tmp, depletedq)
> +    {
> +        svc = __q_elem(iter);
> +        if ( now >= svc->cur_deadline )
> +        {
> +            rt_update_deadline(now, svc);
> +            __q_remove(svc); /* remove from depleted queue */
> +            new_position = __runq_insert(ops, svc); /* add to runq */
> +        }
> +        else break; // leave list_for_each_safe
> +    }
> +
> +    // If we become one of top [# CPUs] in the runq, tickle it
> +    // TODO: make this work when multiple tickles are required

Why do you need multiple tickles?

> +    if ( new_position > 0 && new_position <= prv->NUM_CPUS )
> +        runq_tickle(ops, svc);
> +
> +    // Use first item on deletedq to schedule next replenishment.
> +    // If no replenishments are pending, disable timer for now
> +    if( !list_empty(depletedq) )
> +    {
> +        struct rt_vcpu * soonest_to_replenish = __q_elem(depletedq->next);
> +        set_timer( &prv->replenishment_timer,
> +                   soonest_to_replenish->cur_deadline );
> +    }


What if the inserted VCPU in the depletedq is not at the top of the
depletedq?  You deactive the current timer and then active the same
timer, which is unnecessary and costly. Since this is in the hot path,
you can avoid this deactive-reactive stuff by using an "if".


> +
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +}
> +
>  /*
>   * Init/Free related code
>   */
> @@ -427,6 +500,7 @@ static int
>  rt_init(struct scheduler *ops)
>  {
>      struct rt_private *prv = xzalloc(struct rt_private);
> +    int cpu = 0;
>
>      printk("Initializing RTDS scheduler\n"
>             "WARNING: This is experimental software in development.\n"
> @@ -450,6 +524,15 @@ rt_init(struct scheduler *ops)
>      INIT_LIST_HEAD(&prv->runq);
>      INIT_LIST_HEAD(&prv->depletedq);
>
> +    // Replenishment timer, for now always on CPU 0
> +    init_timer(&prv->replenishment_timer, replenishment_handler, ops, 0);
> +    // Calculate number of pCPUs present.
> +    // Note: assumes does not change online.

Dario, is this a valid assumption? I'm not sure if Xen supports cpu
hot plug and if cpu-hotplug (in case xen supports it) will invalidate
this assumption.

> +    for_each_present_cpu(cpu)
> +    {
> +        ++(prv->NUM_CPUS);
> +    }
> +

This is incorrect. The for_each_present_cpu() list all cpus in the
system. it enumerate the cpu_present_map.

What you want here is the number of cpus for the scheduler in the
current cpupool. You can increase this number when  rt_alloc_pdata()
is called.


>      cpumask_clear(&prv->tickled);
>
>      ops->sched_data = prv;
> @@ -466,6 +549,8 @@ rt_deinit(const struct scheduler *ops)
>  {
>      struct rt_private *prv = rt_priv(ops);
>
> +    kill_timer(&prv->replenishment_timer);
> +
>      ASSERT( _cpumask_scratch && nr_rt_ops > 0 );
>
>      if ( (--nr_rt_ops) == 0 )
> @@ -653,8 +738,7 @@ rt_vcpu_remove(const struct scheduler *ops, struct vcpu 
> *vc)
>      BUG_ON( sdom == NULL );
>
>      lock = vcpu_schedule_lock_irq(vc);
> -    if ( __vcpu_on_q(svc) )
> -        __q_remove(svc);
> +    __q_remove(svc);
>      vcpu_schedule_unlock_irq(lock, vc);
>
>      if ( !is_idle_vcpu(vc) )
> @@ -838,6 +922,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, 
> bool_t tasklet_work_sched
>  {
>      const int cpu = smp_processor_id();
>      struct rt_private *prv = rt_priv(ops);
> +    struct list_head *runq = rt_runq(ops);
>      struct rt_vcpu *const scurr = rt_vcpu(current);
>      struct rt_vcpu *snext = NULL;
>      struct task_slice ret = { .migrated = 0 };
> @@ -848,8 +933,6 @@ rt_schedule(const struct scheduler *ops, s_time_t now, 
> bool_t tasklet_work_sched
>      /* burn_budget would return for IDLE VCPU */
>      burn_budget(ops, scurr, now);
>
> -    __repl_update(ops, now);
> -
>      if ( tasklet_work_scheduled )
>      {
>          snext = rt_vcpu(idle_vcpu[cpu]);
> @@ -889,7 +972,10 @@ rt_schedule(const struct scheduler *ops, s_time_t now, 
> bool_t tasklet_work_sched
>          }
>      }
>
> -    ret.time = MIN(snext->budget, MAX_SCHEDULE); /* sched quantum */
> +    /* Use minimum from runq */
> +    ret.time = MAX_SCHEDULE;

why it is not snext->budget? The budget enforcement only need to be
invoked when the VCPU scheduled to run runs out of budget;

> +    if( !list_empty(runq) )
> +        ret.time = __q_elem(runq->next)->cur_deadline;

This function is for budget enforcement. Why do you need this?
Maybe you want to consider the situation when a VCPU still has budget
remaining at the end of the current deadline. You want to use the
cur_deadline to enforce the budget, since the RTDS scheduler does not
allow a VCPU overrun its budget when it misses its deadline and "The
VCPU discards its unused budget at the end of each period."

If it is the case, you can use snxet->vcpu to get the cur_deadline
instead of using the runq->next.

In addition, this rt_schedule function does not handle the above logic
right now. Although it still works (due to how  __repl_update is
implemented), it is better to explicitly have it in rt_schedule(),
IMHO.

>      ret.task = snext->vcpu;
>
>      /* TRACE */
> --
> 1.7.9.5
>



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.