[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/sched: rework credit2 run-queue allocation



On Thu, 2020-01-23 at 09:55 +0100, Juergen Gross wrote:
> Currently the memory for each run-queue of the credit2 scheduler is
> allocated at the scheduler's init function: for each cpu in the
> system
> a struct csched2_runqueue_data is being allocated, even if the
> current scheduler only handles one physical cpu or is configured to
> work with a single run-queue. As each struct contains 4 cpumasks this
> sums up to rather large memory sizes pretty fast.
> 
Ok, I finally found the time to look at this... And I like it. :-)

> In fact this fixes a bug in credit2 related to run-queue handling:
> cpu_to_runqueue() will return the first free or matching run-queue,
> which ever is found first. So in case a cpu is removed from credit2
> this could result in e.g. run-queue 0 becoming free, so when another
> cpu is added it will in any case be assigned to that free run-queue,
> even if it would have found another run-queue matching later.
> 
That's a good catch... Thanks!

So, I only have a request, and a question:

> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -849,51 +822,71 @@ static inline bool same_core(unsigned int cpua,
> unsigned int cpub)
>             cpu_to_core(cpua) == cpu_to_core(cpub);
>  }
>  
> -static unsigned int
> -cpu_to_runqueue(const struct csched2_private *prv, unsigned int cpu)
> +static struct csched2_runqueue_data *
> +cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
>  {
> -    const struct csched2_runqueue_data *rqd;
> -    unsigned int rqi;
> +    struct csched2_runqueue_data *rqd, *rqd_new;
> +    struct list_head *rqd_ins;
> +    unsigned long flags;
> +    int rqi = 0;
> +    bool rqi_unused = false, rqd_valid = false;
> +
> +    rqd_new = xzalloc(struct csched2_runqueue_data);
>  
>
So, I'm not sure I see why it's better to allocating this here, and
then free it if we didn't need it, instead than allocating it later,
only if we actually need it... What am I missing? :-)

> -    for ( rqi = 0; rqi < nr_cpu_ids; rqi++ )
> +    write_lock_irqsave(&prv->lock, flags);
> +
> +    rqd_ins = &prv->rql;
> +    list_for_each_entry ( rqd, &prv->rql, rql )
>      {
>          unsigned int peer_cpu;
>  
> -        /*
> -         * As soon as we come across an uninitialized runqueue, use
> it.
> -         * In fact, either:
> -         *  - we are initializing the first cpu, and we assign it to
> -         *    runqueue 0. This is handy, especially if we are
> dealing
> -         *    with the boot cpu (if credit2 is the default
> scheduler),
> -         *    as we would not be able to use cpu_to_socket() and
> similar
> -         *    helpers anyway (they're result of which is not
> reliable yet);
> -         *  - we have gone through all the active runqueues, and
> have not
> -         *    found anyone whose cpus' topology matches the one we
> are
> -         *    dealing with, so activating a new runqueue is what we
> want.
> -         */
> -        if ( prv->rqd[rqi].id == -1 )
> -            break;
> +        /* Remember first unused queue index. */
> +        if ( !rqi_unused && rqd->id > rqi )
> +            rqi_unused = true;
>  
> -        rqd = prv->rqd + rqi;
> -        BUG_ON(cpumask_empty(&rqd->active));
> -
> -        peer_cpu = cpumask_first(&rqd->active);
> +        peer_cpu = rqd->pick_bias;
>          BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
>                 cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
>  
> -        if (opt_runqueue == OPT_RUNQUEUE_CPU)
> -            continue;
>          if ( opt_runqueue == OPT_RUNQUEUE_ALL ||
>               (opt_runqueue == OPT_RUNQUEUE_CORE &&
> same_core(peer_cpu, cpu)) ||
>               (opt_runqueue == OPT_RUNQUEUE_SOCKET &&
> same_socket(peer_cpu, cpu)) ||
>               (opt_runqueue == OPT_RUNQUEUE_NODE &&
> same_node(peer_cpu, cpu)) )
> +        {
> +            rqd_valid = true;
>              break;
> +        }
>
So, OPT_RUNQUEUE_CPU is just disappearing. If I understood the
algorithm correctly, that is because in such case we just scan through
the whole list, without finding any match, and the we'll allocate a new
runqueue (while, for any of the other options, we stop as soon as we
found a runqueue with a CPU inside it which match the criteria).

Can we add a comment about this. Not necessary to describe the
algorithm in details, I don't think... just a few words, especially
about the fact that the enum has a _CPU item that, at a first and quick
look, we seem to be ignoring here?

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.