[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v2 1/4] xen: add real time scheduler rt



Hi Andrew,

Thank you very much for your comments!

2014-07-29 18:36 GMT+08:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>:
On 29/07/14 02:52, Meng Xu wrote:
> This scheduler follows the pre-emptive Global EDF theory in real-time field.
> Each VCPU can have a dedicated period and budget.
> While scheduled, a VCPU burns its budget.
> A VCPU has its budget replenished at the beginning of each of its periods;
> The VCPU discards its unused budget at the end of each of its periods.
> If a VCPU runs out of budget in a period, it has to wait until next period.
> The mechanism of how to burn a VCPU's budget depends on the server mechanism
> implemented for each VCPU.
>
> Server mechanism: a VCPU is implemented as a deferable server.
> When a VCPU is scheduled to execute on a PCPU, its budget is continuously
> burned.
>
> Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
> At any scheduling point, the VCPU with earliest deadline has highest
> priority.
>
> Queue scheme: A global Runqueue for each CPU pool.
> The Runqueue holds all runnable VCPUs.
> VCPUs in the Runqueue are divided into two parts: with and without budget.
> At each part, VCPUs are sorted based on gEDF priority scheme.
>
> Scheduling quantum: 1 ms;
>
> Note: cpumask and cpupool is supported.
>
> This is still in the development phase.
>
> Signed-off-by: Sisu Xi <xisisu@xxxxxxxxx>
> Signed-off-by: Meng Xu <mengxu@xxxxxxxxxxxxx>
> ---
> Âxen/common/Makefile     |  Â1 +
> Âxen/common/sched_rt.c    | 1058 +++++++++++++++++++++++++++++++++++++++++++
> Âxen/common/schedule.c    |  Â4 +-
> Âxen/include/public/domctl.h | Â 28 +-
> Âxen/include/xen/sched-if.h Â| Â Â1 +
> Â5 files changed, 1089 insertions(+), 3 deletions(-)
> Âcreate mode 100644 xen/common/sched_rt.c
>
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 3683ae3..5a23aa4 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -26,6 +26,7 @@ obj-y += sched_credit.o
> Âobj-y += sched_credit2.o
> Âobj-y += sched_sedf.o
> Âobj-y += sched_arinc653.o
> +obj-y += sched_rt.o
> Âobj-y += schedule.o
> Âobj-y += shutdown.o
> Âobj-y += softirq.o
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> new file mode 100644
> index 0000000..6cfbb8a
> --- /dev/null
> +++ b/xen/common/sched_rt.c
> @@ -0,0 +1,1058 @@
> +/******************************************************************************
> + * Preemptive Global Earliest Deadline First Â(EDF) scheduler for Xen
> + * EDF scheduling is one of most popular real-time scheduling algorithm used in
> + * embedded field.
> + *
> + * by Sisu Xi, 2013, Washington University in Saint Louis
> + * and Meng Xu, 2014, University of Pennsylvania
> + *
> + * based on the code of credit Scheduler
> + */
> +
> +#include <xen/config.h>
> +#include <xen/init.h>
> +#include <xen/lib.h>
> +#include <xen/sched.h>
> +#include <xen/domain.h>
> +#include <xen/delay.h>
> +#include <xen/event.h>
> +#include <xen/time.h>
> +#include <xen/perfc.h>
> +#include <xen/sched-if.h>
> +#include <xen/softirq.h>
> +#include <asm/atomic.h>
> +#include <xen/errno.h>
> +#include <xen/trace.h>
> +#include <xen/cpu.h>
> +#include <xen/keyhandler.h>
> +#include <xen/trace.h>
> +#include <xen/guest_access.h>
> +
> +/*
> + * TODO:
> + *
> + * Migration compensation and resist like credit2 to better use cache;
> + * Lock Holder Problem, using yield?
> + * Self switch problem: VCPUs of the same domain may preempt each other;
> + */
> +
> +/*
> + * Design:
> + *
> + * This scheduler follows the Preemptive Global EDF theory in real-time field.
> + * Each VCPU can have a dedicated period and budget.
> + * While scheduled, a VCPU burns its budget.
> + * A VCPU has its budget replenished at the beginning of each of its periods;
> + * The VCPU discards its unused budget at the end of each of its periods.
> + * If a VCPU runs out of budget in a period, it has to wait until next period.
> + * The mechanism of how to burn a VCPU's budget depends on the server mechanism
> + * implemented for each VCPU.
> + *
> + * Server mechanism: a VCPU is implemented as a deferable server.
> + * When a VCPU has a task running on it, its budget is continuously burned;
> + * When a VCPU has no task but with budget left, its budget is preserved.
> + *
> + * Priority scheme: Preemptive Global Earliest Deadline First (gEDF).
> + * At any scheduling point, the VCPU with earliest deadline has highest priority.
> + *
> + * Queue scheme: A global runqueue for each CPU pool.
> + * The runqueue holds all runnable VCPUs.
> + * VCPUs in the runqueue are divided into two parts: with and without remaining budget.
> + * At each part, VCPUs are sorted based on EDF priority scheme.
> + *
> + * Scheduling quanta: 1 ms; but accounting the budget is in microsecond.
> + *
> + * Note: cpumask and cpupool is supported.
> + */
> +
> +/*
> + * Locking:
> + * Just like credit2, a global system lock is used to protect the RunQ.
> + * The global lock is referenced by schedule_data.schedule_lock from all physical cpus.
> + *
> + * The lock is already grabbed when calling wake/sleep/schedule/ functions in schedule.c
> + *
> + * The functions involes RunQ and needs to grab locks are:
> + * Â Âdump, vcpu_insert, vcpu_remove, context_saved,
> + */
> +
> +
> +/*
> + * Default parameters: Period and budget in default is 10 and 4 ms, respectively
> + */
> +#define RT_DEFAULT_PERIOD Â Â (MICROSECS(10))
> +#define RT_DEFAULT_BUDGET Â Â (MICROSECS(4))
> +
> +/*
> + * Useful macros
> + */
> +#define RT_PRIV(_ops) Â Â ((struct rt_private *)((_ops)->sched_data))
> +#define RT_VCPU(_vcpu) Â Â((struct rt_vcpu *)(_vcpu)->sched_priv)
> +#define RT_DOM(_dom) Â Â Â((struct rt_dom *)(_dom)->sched_priv)
> +#define RUNQ(_ops) Â Â Â Â Â Â Â(&RT_PRIV(_ops)->runq)

I know you are copying the prevailing style, but these macros are nasty.

They would be perfectly fine as static inlines with some real types...

static inline struct rt_private *RT_PRIV(const scheduler *ops)
{
  return ops->sched_data;
}

... which allow for rather more useful compiler errors in the case that
they get accidentally misused.

âThis is a good suggestion and I have modified it for the next version.Â
â

> +
> +/*
> + * Flags
> + */
> +/* RT_scheduled: Is this vcpu either running on, or context-switching off,
> + * a phyiscal cpu?
> + * + Accessed only with Runqueue lock held.
> + * + Set when chosen as next in rt_schedule().
> + * + Cleared after context switch has been saved in rt_context_saved()
> + * + Checked in vcpu_wake to see if we can add to the Runqueue, or if we should
> + * Â set RT_delayed_runq_add
> + * + Checked to be false in runq_insert.
> + */
> +#define __RT_scheduled      Â1
> +#define RT_scheduled (1<<__RT_scheduled)
> +/* RT_delayed_runq_add: Do we need to add this to the Runqueueu once it'd done
> + * being context switching out?
> + * + Set when scheduling out in rt_schedule() if prev is runable
> + * + Set in rt_vcpu_wake if it finds RT_scheduled set
> + * + Read in rt_context_saved(). If set, it adds prev to the Runqueue and
> + * Â clears the bit.
> + *
> + */
> +#define __RT_delayed_runq_add   2
> +#define RT_delayed_runq_add (1<<__RT_delayed_runq_add)
> +
> +/*
> + * Debug only. Used to printout debug information
> + */
> +#define printtime()\
> + Â Â Â Â({s_time_t now = NOW(); \
> + Â Â Â Â Âprintk("%u : %3ld.%3ldus : %-19s ",\
> + Â Â Â Â Âsmp_processor_id(), now/MICROSECS(1), now%MICROSECS(1)/1000, __func__);} )
> +
> +/*
> + * Systme-wide private data, include a global RunQueue
> + * The global lock is referenced by schedule_data.schedule_lock from all physical cpus.
> + * It can be grabbed via vcpu_schedule_lock_irq()
> + */
> +struct rt_private {
> + Â Âspinlock_t lock; Â Â Â Â/* The global coarse grand lock */
> + Â Âstruct list_head sdom; Â/* list of availalbe domains, used for dump */
> + Â Âstruct list_head runq; Â/* Ordered list of runnable VMs */
> + Â Âcpumask_t cpus; Â Â Â Â /* cpumask_t of available physical cpus */
> + Â Âcpumask_t tickled; Â Â Â/* another cpu in the queue already ticked this one */
> +};
> +
> +/*
> + * Virtual CPU
> + */
> +struct rt_vcpu {
> + Â Âstruct list_head runq_elem; /* On the runqueue list */
> + Â Âstruct list_head sdom_elem; /* On the domain VCPU list */
> +
> + Â Â/* Up-pointers */
> + Â Âstruct rt_dom *sdom;
> + Â Âstruct vcpu *vcpu;
> +
> + Â Â/* VCPU parameters, in milliseconds */
> + Â Âs_time_t period;
> + Â Âs_time_t budget;
> +
> + Â Â/* VCPU current infomation in nanosecond */
> + Â Âlong cur_budget; Â Â Â Â Â Â /* current budget */
> + Â Âs_time_t last_start; Â Â Â Â/* last start time */
> + Â Âs_time_t cur_deadline; Â Â Â/* current deadline for EDF */
> +
> + Â Âunsigned flags; Â Â Â Â Â Â /* mark __RT_scheduled, etc.. */
> +};
> +
> +/*
> + * Domain
> + */
> +struct rt_dom {
> + Â Âstruct list_head vcpu; Â Â Â/* link its VCPUs */
> + Â Âstruct list_head sdom_elem; /* link list on rt_priv */
> + Â Âstruct domain *dom; Â Â Â Â /* pointer to upper domain */
> +};
> +
> +/*
> + * RunQueue helper functions
> + */
> +static int
> +__vcpu_on_runq(struct rt_vcpu *svc)
> +{
> + Â return !list_empty(&svc->runq_elem);
> +}
> +
> +static struct rt_vcpu *
> +__runq_elem(struct list_head *elem)
> +{
> + Â Âreturn list_entry(elem, struct rt_vcpu, runq_elem);
> +}
> +
> +/*
> + * Debug related code, dump vcpu/cpu information
> + */
> +static void
> +rt_dump_vcpu(struct rt_vcpu *svc)

const struct rc_vcpu *svc, for added safety. Â(In the past we have had a
dump function which accidentally clobbered some of the state it was
supposed to be reading)

> +{
> + Â Âchar *cpustr = keyhandler_scratch;

Xen style - newline between declarations and code.

The keyhandler_scratch is only safe to use in keyhandlers, yet your dump
functions are based on scheduling operations. ÂYou risk concurrent
access with other dump functions and with keyhandlers.

âI see. I changed the cpustr to char[1024]. This should solve this issue.â

Â

> + Â Âif ( svc == NULL )
> + Â Â{
> + Â Â Â Âprintk("NULL!\n");
> + Â Â Â Âreturn;
> + Â Â}

This is quite a useless message on its own. ÂIs it reasonable for svc to
ever be NULL here?


âNo, it should not be NULL. I changed it to ASSERT() for next version.â

Â
> + Â Âcpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);

Buggy sizeof. Âsizeof(cpustr) is 4, where I suspect you mean
sizeof(keyscratch_handler) which is 1024.

âSorry! :-( Corrected it. â


> + Â Âfor_each_cpu(cpu, &prv->cpus)
> + Â Â Â Ârt_dump_pcpu(ops, cpu);
> +
> + Â Âprintk("Global RunQueue info: \n");
> + Â Âloop = 0;
> + Â Ârunq = RUNQ(ops);
> + Â Âlist_for_each( iter, runq )
> + Â Â{
> + Â Â Â Âsvc = __runq_elem(iter);
> + Â Â Â Âprintk("\t%3d: ", ++loop);
> + Â Â Â Ârt_dump_vcpu(svc);
> + Â Â}
> +
> + Â Âprintk("Domain info: \n");
> + Â Âloop = 0;
> + Â Âlist_for_each( iter_sdom, &prv->sdom )
> + Â Â{
> + Â Â Â Âstruct rt_dom *sdom;
> + Â Â Â Âsdom = list_entry(iter_sdom, struct rt_dom, sdom_elem);
> + Â Â Â Âprintk("\tdomain: %d\n", sdom->dom->domain_id);
> +
> + Â Â Â Âlist_for_each( iter_svc, &sdom->vcpu )
> + Â Â Â Â{
> + Â Â Â Â Â Âsvc = list_entry(iter_svc, struct rt_vcpu, sdom_elem);
> + Â Â Â Â Â Âprintk("\t\t%3d: ", ++loop);
> + Â Â Â Â Â Ârt_dump_vcpu(svc);
> + Â Â Â Â}
> + Â Â}
> +
> + Â Âprintk("\n");
> +}
> +
> +static inline void
> +__runq_remove(struct rt_vcpu *svc)
> +{
> + Â Âif ( __vcpu_on_runq(svc) )
> + Â Â Â Âlist_del_init(&svc->runq_elem);
> +}
> +
> +/*
> + * Insert a vcpu in the RunQ based on vcpus' deadline:
> + * EDF schedule policy: vcpu with smaller deadline has higher priority;
> + * The vcpu svc to be inserted will be inserted just before the very first
> + * vcpu iter_svc in the Runqueue whose deadline is equal or larger than
> + * svc's deadline.
> + */
> +static void
> +__runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
> + Â Âstruct list_head *runq = RUNQ(ops);
> + Â Âstruct list_head *iter;
> + Â Âspinlock_t *schedule_lock;
> +
> + Â Âschedule_lock = per_cpu(schedule_data, svc->vcpu->processor).schedule_lock;
> + Â ÂASSERT( spin_is_locked(schedule_lock) );
> +
> + Â Â/* Debug only */
> + Â Âif ( __vcpu_on_runq(svc) )
> + Â Â{
> + Â Â Â Ârt_dump(ops);
> + Â Â}
> + Â ÂASSERT( !__vcpu_on_runq(svc) );
> +
> + Â Âlist_for_each(iter, runq)
> + Â Â{
> + Â Â Â Âstruct rt_vcpu * iter_svc = __runq_elem(iter);
> +
> + Â Â Â Â/* svc still has budget */
> + Â Â Â Âif ( svc->cur_budget > 0 )
> + Â Â Â Â{
> + Â Â Â Â Â Âif ( iter_svc->cur_budget == 0 ||
> + Â Â Â Â Â Â Â Â svc->cur_deadline <= iter_svc->cur_deadline )
> + Â Â Â Â Â Â Â Â Â Âbreak;
> + Â Â Â Â}
> + Â Â Â Âelse
> + Â Â Â Â{ /* svc has no budget */
> + Â Â Â Â Â Âif ( iter_svc->cur_budget == 0 &&
> + Â Â Â Â Â Â Â Â svc->cur_deadline <= iter_svc->cur_deadline )
> + Â Â Â Â Â Â Â Â Â Âbreak;
> + Â Â Â Â}
> + Â Â}
> +
> + Â Âlist_add_tail(&svc->runq_elem, iter);
> +}
> +
> +/*
> + * Init/Free related code
> + */
> +static int
> +rt_init(struct scheduler *ops)
> +{
> + Â Âstruct rt_private *prv = xzalloc(struct rt_private);
> +
> + Â Âif ( prv == NULL )
> + Â Â Â Âreturn -ENOMEM;

Newline in here.

âChanged, thanks!
â

> + Â Âops->sched_data = prv;

Is it safe to set ops->sched_data with a half constructed rt_private? ÂI
suspect this wants to be the very last (non-debug) action in this function.

âI think it should be fine. I double checked the _init function in sched_credit2.c. It has the similar operation: first assign prv to ops->sched_data and then set the value of prv.Â
Of course, I can switch âthem. But I'm not sure if that really matter. :-)

Â

> + Â Âspin_lock_init(&prv->lock);
> + Â ÂINIT_LIST_HEAD(&prv->sdom);
> + Â ÂINIT_LIST_HEAD(&prv->runq);
> +
> + Â Âprinttime();
> + Â Âprintk("\n");
> +
> + Â Âreturn 0;
> +}
> +
> +static void
> +rt_deinit(const struct scheduler *ops)
> +{
> + Â Âstruct rt_private *prv = RT_PRIV(ops);
> +
> + Â Âprinttime();
> + Â Âprintk("\n");
> + Â Âxfree(prv);
> +}
> +
> +/*
> + * point per_cpu spinlock to the global system lock; all cpu have same global system lock
> + */
> +static void *
> +rt_alloc_pdata(const struct scheduler *ops, int cpu)
> +{
> + Â Âstruct rt_private *prv = RT_PRIV(ops);
> +
> + Â Âcpumask_set_cpu(cpu, &prv->cpus);
> +
> + Â Âper_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
> +
> + Â Âprinttime();
> + Â Âprintk("%s total cpus: %d", __FUNCTION__, cpumask_weight(&prv->cpus));

__FUNCTION__ is a gccism. Â__func__ is a standard way of doing the same.

âChanged. Thanks!
â

> + Â Â/* same as credit2, not a bogus pointer */
> + Â Âreturn (void *)1;
> +}
> +
> +static void
> +rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> +{
> + Â Âstruct rt_private * prv = RT_PRIV(ops);
> + Â Âcpumask_clear_cpu(cpu, &prv->cpus);
> + Â Âprinttime();
> + Â Âprintk("%s cpu=%d\n", __FUNCTION__, cpu);
> +}
> +
> +static void *
> +rt_alloc_domdata(const struct scheduler *ops, struct domain *dom)
> +{
> + Â Âunsigned long flags;
> + Â Âstruct rt_dom *sdom;
> + Â Âstruct rt_private * prv = RT_PRIV(ops);
> +
> + Â Âprinttime();
> + Â Âprintk("dom=%d\n", dom->domain_id);
> +
> + Â Âsdom = xzalloc(struct rt_dom);
> + Â Âif ( sdom == NULL )
> + Â Â{
> + Â Â Â Âprintk("%s, xzalloc failed\n", __func__);
> + Â Â Â Âreturn NULL;
> + Â Â}
> +
> + Â ÂINIT_LIST_HEAD(&sdom->vcpu);
> + Â ÂINIT_LIST_HEAD(&sdom->sdom_elem);
> + Â Âsdom->dom = dom;
> +
> + Â Â/* spinlock here to insert the dom */
> + Â Âspin_lock_irqsave(&prv->lock, flags);
> + Â Âlist_add_tail(&sdom->sdom_elem, &(prv->sdom));
> + Â Âspin_unlock_irqrestore(&prv->lock, flags);
> +
> + Â Âreturn (void *)sdom;

Bogus cast.

âI think we have to cast it to void * âbecause the definition of this function asks the return type to be void *. In addition, the credit2 scheduler also did the same cast in Âthis _alloc_domdata function. So I guess this should be fine?

Â

> +}
> +
> +static void
> +rt_free_domdata(const struct scheduler *ops, void *data)
> +{
> + Â Âunsigned long flags;
> + Â Âstruct rt_dom *sdom = data;
> + Â Âstruct rt_private *prv = RT_PRIV(ops);
> +
> + Â Âprinttime();
> + Â Âprintk("dom=%d\n", sdom->dom->domain_id);
> +
> + Â Âspin_lock_irqsave(&prv->lock, flags);
> + Â Âlist_del_init(&sdom->sdom_elem);
> + Â Âspin_unlock_irqrestore(&prv->lock, flags);
> + Â Âxfree(data);
> +}
> +
> +static int
> +rt_dom_init(const struct scheduler *ops, struct domain *dom)
> +{
> + Â Âstruct rt_dom *sdom;
> +
> + Â Âprinttime();
> + Â Âprintk("dom=%d\n", dom->domain_id);
> +
> + Â Â/* IDLE Domain does not link on rt_private */
> + Â Âif ( is_idle_domain(dom) )
> + Â Â Â Âreturn 0;
> +
> + Â Âsdom = rt_alloc_domdata(ops, dom);
> + Â Âif ( sdom == NULL )
> + Â Â{
> + Â Â Â Âprintk("%s, failed\n", __func__);
> + Â Â Â Âreturn -ENOMEM;
> + Â Â}
> + Â Âdom->sched_priv = sdom;
> +
> + Â Âreturn 0;
> +}
> +
> +static void
> +rt_dom_destroy(const struct scheduler *ops, struct domain *dom)
> +{
> + Â Âprinttime();
> + Â Âprintk("dom=%d\n", dom->domain_id);
> +
> + Â Ârt_free_domdata(ops, RT_DOM(dom));
> +}
> +
> +static void *
> +rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
> +{
> + Â Âstruct rt_vcpu *svc;
> + Â Âs_time_t now = NOW();
> + Â Âlong count;
> +
> + Â Â/* Allocate per-VCPU info */
> + Â Âsvc = xzalloc(struct rt_vcpu);
> + Â Âif ( svc == NULL )
> + Â Â{
> + Â Â Â Âprintk("%s, xzalloc failed\n", __func__);
> + Â Â Â Âreturn NULL;
> + Â Â}
> +
> + Â ÂINIT_LIST_HEAD(&svc->runq_elem);
> + Â ÂINIT_LIST_HEAD(&svc->sdom_elem);
> + Â Âsvc->flags = 0U;
> + Â Âsvc->sdom = dd;
> + Â Âsvc->vcpu = vc;
> + Â Âsvc->last_start = 0; Â Â Â Â Â Â/* init last_start is 0 */
> +
> + Â Âsvc->period = RT_DEFAULT_PERIOD;
> + Â Âif ( !is_idle_vcpu(vc) )
> + Â Â Â Âsvc->budget = RT_DEFAULT_BUDGET;
> +
> + Â Âcount = (now/MICROSECS(svc->period)) + 1;
> + Â Â/* sync all VCPU's start time to 0 */
> + Â Âsvc->cur_deadline += count * MICROSECS(svc->period);
> +
> + Â Âsvc->cur_budget = svc->budget*1000; /* counting in microseconds level */
> + Â Â/* Debug only: dump new vcpu's info */
> + Â Âprinttime();
> + Â Ârt_dump_vcpu(svc);
> +
> + Â Âreturn svc;
> +}
> +
> +static void
> +rt_free_vdata(const struct scheduler *ops, void *priv)
> +{
> + Â Âstruct rt_vcpu *svc = priv;
> +
> + Â Â/* Debug only: dump freed vcpu's info */
> + Â Âprinttime();
> + Â Ârt_dump_vcpu(svc);
> + Â Âxfree(svc);
> +}
> +
> +/*
> + * TODO: Do we need to add vc to the new Runqueue?
> + * This function is called in sched_move_domain() in schedule.c
> + * When move a domain to a new cpupool,
> + * may have to add vc to the Runqueue of the new cpupool
> + */
> +static void
> +rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
> +{
> + Â Âstruct rt_vcpu *svc = RT_VCPU(vc);
> +
> + Â Â/* Debug only: dump info of vcpu to insert */
> + Â Âprinttime();
> + Â Ârt_dump_vcpu(svc);
> +
> + Â Â/* not addlocate idle vcpu to dom vcpu list */
> + Â Âif ( is_idle_vcpu(vc) )
> + Â Â Â Âreturn;
> +
> + Â Âlist_add_tail(&svc->sdom_elem, &svc->sdom->vcpu); Â /* add to dom vcpu list */
> +}
> +
> +/*
> + * TODO: same as rt_vcpu_insert()
> + */
> +static void
> +rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
> +{
> + Â Âstruct rt_vcpu * const svc = RT_VCPU(vc);
> + Â Âstruct rt_dom * const sdom = svc->sdom;
> +
> + Â Âprinttime();
> + Â Ârt_dump_vcpu(svc);
> +
> + Â ÂBUG_ON( sdom == NULL );
> + Â ÂBUG_ON( __vcpu_on_runq(svc) );
> +
> + Â Âif ( !is_idle_vcpu(vc) )
> + Â Â Â Âlist_del_init(&svc->sdom_elem);
> +}
> +
> +/*
> + * Pick a valid CPU for the vcpu vc
> + * Valid CPU of a vcpu is intesection of vcpu's affinity and available cpus
> + */
> +static int
> +rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
> +{
> + Â Âcpumask_t cpus;
> + Â Âcpumask_t *online;
> + Â Âint cpu;
> + Â Âstruct rt_private * prv = RT_PRIV(ops);
> +
> + Â Â> > + Â Âcpumask_and(&cpus, &prv->cpus, online);
> + Â Âcpumask_and(&cpus, &cpus, vc->cpu_hard_affinity);
> +
> + Â Âcpu = cpumask_test_cpu(vc->processor, &cpus)
> + Â Â Â Â Â Â? vc->processor
> + Â Â Â Â Â Â: cpumask_cycle(vc->processor, &cpus);
> + Â ÂASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
> +
> + Â Âreturn cpu;
> +}
> +
> +/*
> + * Burn budget at microsecond level.
> + */
> +static void
> +burn_budgets(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
> +{
> + Â Âs_time_t delta;
> + Â Âlong count = 0;
> +
> + Â Â/* don't burn budget for idle VCPU */
> + Â Âif ( is_idle_vcpu(svc->vcpu) )
> + Â Â{
> + Â Â Â Âreturn;
> + Â Â}
> +
> + Â Â/* first time called for this svc, update last_start */
> + Â Âif ( svc->last_start == 0 )
> + Â Â{
> + Â Â Â Âsvc->last_start = now;
> + Â Â Â Âreturn;
> + Â Â}
> +
> + Â Â/*
> + Â Â * update deadline info: When deadline is in the past,
> + Â Â * it need to be updated to the deadline of the current period,
> + Â Â * and replenish the budget
> + Â Â */
> + Â Âdelta = now - svc->cur_deadline;
> + Â Âif ( delta >= 0 )
> + Â Â{
> + Â Â Â Âcount = ( delta/MICROSECS(svc->period) ) + 1;
> + Â Â Â Âsvc->cur_deadline += count * MICROSECS(svc->period);
> + Â Â Â Âsvc->cur_budget = svc->budget * 1000;
> + Â Â Â Âreturn;
> + Â Â}
> +
> + Â Â/* burn at nanoseconds level */
> + Â Âdelta = now - svc->last_start;
> + Â Â/*
> + Â Â * delta < 0 only happens in nested virtualization;
> + Â Â * TODO: how should we handle delta < 0 in a better way? */
> + Â Âif ( delta < 0 )
> + Â Â{
> + Â Â Â Âprintk("%s, ATTENTION: now is behind last_start! delta = %ld for ",
> + Â Â Â Â Â Â Â Â__func__, delta);
> + Â Â Â Ârt_dump_vcpu(svc);
> + Â Â Â Âsvc->last_start = now; Â/* update last_start */
> + Â Â Â Âsvc->cur_budget = 0; Â /* FIXME: should we recover like this? */
> + Â Â Â Âreturn;
> + Â Â}
> +
> + Â Âif ( svc->cur_budget == 0 )
> + Â Â Â Âreturn;
> +
> + Â Âsvc->cur_budget -= delta;
> + Â Âif ( svc->cur_budget < 0 )
> + Â Â Â Âsvc->cur_budget = 0;
> +}
> +
> +/*
> + * RunQ is sorted. Pick first one within cpumask. If no one, return NULL
> + * lock is grabbed before calling this function
> + */
> +static struct rt_vcpu *
> +__runq_pick(const struct scheduler *ops, cpumask_t mask)
> +{
> + Â Âstruct list_head *runq = RUNQ(ops);
> + Â Âstruct list_head *iter;
> + Â Âstruct rt_vcpu *svc = NULL;
> + Â Âstruct rt_vcpu *iter_svc = NULL;
> + Â Âcpumask_t cpu_common;
> + Â Âcpumask_t *online;
> + Â Âstruct rt_private * prv = RT_PRIV(ops);
> +
> + Â Âlist_for_each(iter, runq)
> + Â Â{
> + Â Â Â Âiter_svc = __runq_elem(iter);
> +
> + Â Â Â Â/* mask is intersection of cpu_hard_affinity and cpupool and priv->cpus */
> + Â Â Â Â> > + Â Â Â Âcpumask_and(&cpu_common, online, &prv->cpus);
> + Â Â Â Âcpumask_and(&cpu_common, &cpu_common, iter_svc->vcpu->cpu_hard_affinity);
> + Â Â Â Âcpumask_and(&cpu_common, &mask, &cpu_common);
> + Â Â Â Âif ( cpumask_empty(&cpu_common) )
> + Â Â Â Â Â Âcontinue;
> +
> + Â Â Â Âif ( iter_svc->cur_budget <= 0 )
> + Â Â Â Â Â Âcontinue;
> +
> + Â Â Â Âsvc = iter_svc;
> + Â Â Â Âbreak;
> + Â Â}
> +
> + Â Âreturn svc;
> +}
> +
> +/*
> + * Update vcpu's budget and sort runq by insert the modifed vcpu back to runq
> + * lock is grabbed before calling this function
> + */
> +static void
> +__repl_update(const struct scheduler *ops, s_time_t now)
> +{
> + Â Âstruct list_head *runq = RUNQ(ops);
> + Â Âstruct list_head *iter;
> + Â Âstruct list_head *tmp;
> + Â Âstruct rt_vcpu *svc = NULL;
> +
> + Â Âs_time_t diff;
> + Â Âlong count;
> +
> + Â Âlist_for_each_safe(iter, tmp, runq)
> + Â Â{
> + Â Â Â Âsvc = __runq_elem(iter);
> +
> + Â Â Â Âdiff = now - svc->cur_deadline;
> + Â Â Â Âif ( diff > 0 )
> + Â Â Â Â{
> + Â Â Â Â Â Âcount = (diff/MICROSECS(svc->period)) + 1;
> + Â Â Â Â Â Âsvc->cur_deadline += count * MICROSECS(svc->period);
> + Â Â Â Â Â Âsvc->cur_budget = svc->budget * 1000;
> + Â Â Â Â Â Â__runq_remove(svc);
> + Â Â Â Â Â Â__runq_insert(ops, svc);
> + Â Â Â Â}
> + Â Â}
> +}
> +
> +/*
> + * schedule function for rt scheduler.
> + * The lock is already grabbed in schedule.c, no need to lock here
> + */
> +static struct task_slice
> +rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
> +{
> + Â Âconst int cpu = smp_processor_id();
> + Â Âstruct rt_private * prv = RT_PRIV(ops);
> + Â Âstruct rt_vcpu * const scurr = RT_VCPU(current);
> + Â Âstruct rt_vcpu * snext = NULL;
> + Â Âstruct task_slice ret = { .migrated = 0 };
> +
> + Â Â/* clear ticked bit now that we've been scheduled */
> + Â Âif ( cpumask_test_cpu(cpu, &prv->tickled) )
> + Â Â Â Âcpumask_clear_cpu(cpu, &prv->tickled);
> +
> + Â Â/* burn_budget would return for IDLE VCPU */
> + Â Âburn_budgets(ops, scurr, now);
> +
> + Â Â__repl_update(ops, now);
> +
> + Â Âif ( tasklet_work_scheduled )
> + Â Â{
> + Â Â Â Âsnext = RT_VCPU(idle_vcpu[cpu]);
> + Â Â}
> + Â Âelse
> + Â Â{
> + Â Â Â Âcpumask_t cur_cpu;
> + Â Â Â Âcpumask_clear(&cur_cpu);
> + Â Â Â Âcpumask_set_cpu(cpu, &cur_cpu);
> + Â Â Â Âsnext = __runq_pick(ops, cur_cpu);
> + Â Â Â Âif ( snext == NULL )
> + Â Â Â Â Â Âsnext = RT_VCPU(idle_vcpu[cpu]);
> +
> + Â Â Â Â/* if scurr has higher priority and budget, still pick scurr */
> + Â Â Â Âif ( !is_idle_vcpu(current) &&
> + Â Â Â Â Â Â vcpu_runnable(current) &&
> + Â Â Â Â Â Â scurr->cur_budget > 0 &&
> + Â Â Â Â Â Â ( is_idle_vcpu(snext->vcpu) ||
> + Â Â Â Â Â Â Â scurr->cur_deadline <= snext->cur_deadline ) )
> + Â Â Â Â Â Âsnext = scurr;
> + Â Â}
> +
> + Â Âif ( snext != scurr &&
> + Â Â Â Â !is_idle_vcpu(current) &&
> + Â Â Â Â vcpu_runnable(current) )
> + Â Â Â Âset_bit(__RT_delayed_runq_add, &scurr->flags);
> +
> +
> + Â Âsnext->last_start = now;
> + Â Âif ( !is_idle_vcpu(snext->vcpu) )
> + Â Â{
> + Â Â Â Âif ( snext != scurr )
> + Â Â Â Â{
> + Â Â Â Â Â Â__runq_remove(snext);
> + Â Â Â Â Â Âset_bit(__RT_scheduled, &snext->flags);
> + Â Â Â Â}
> + Â Â Â Âif ( snext->vcpu->processor != cpu )
> + Â Â Â Â{
> + Â Â Â Â Â Âsnext->vcpu->processor = cpu;
> + Â Â Â Â Â Âret.migrated = 1;
> + Â Â Â Â}
> + Â Â}
> +
> + Â Âret.time = MILLISECS(1); /* sched quantum */
> + Â Âret.task = snext->vcpu;
> +
> + Â Âreturn ret;
> +}
> +
> +/*
> + * Remove VCPU from RunQ
> + * The lock is already grabbed in schedule.c, no need to lock here
> + */
> +static void
> +rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
> +{
> + Â Âstruct rt_vcpu * const svc = RT_VCPU(vc);
> +
> + Â ÂBUG_ON( is_idle_vcpu(vc) );
> +
> + Â Âif ( curr_on_cpu(vc->processor) == vc )
> + Â Â{
> + Â Â Â Âcpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
> + Â Â Â Âreturn;
> + Â Â}
> +
> + Â Âif ( __vcpu_on_runq(svc) )
> + Â Â{
> + Â Â Â Â__runq_remove(svc);
> + Â Â Â Âprintk("%s: vcpu should not on runq in vcpu_sleep()\n", __FUNCTION__);
> + Â Â Â ÂBUG();
> + Â Â}
> +
> + Â Âclear_bit(__RT_delayed_runq_add, &svc->flags);
> +}
> +
> +/*
> + * Pick a vcpu on a cpu to kick out to place the running candidate
> + * Called by wake() and context_saved()
> + * We have a running candidate here, the kick logic is:
> + * Among all the cpus that are within the cpu affinity
> + * 1) if the new->cpu is idle, kick it. This could benefit cache hit
> + * 2) if there are any idle vcpu, kick it.
> + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one
> + * Â Âif snext has higher priority, kick it.
> + *
> + * TODO:
> + * 1) what if these two vcpus belongs to the same domain?
> + * Â Âreplace a vcpu belonging to the same domain introduces more overhead
> + *
> + * lock is grabbed before calling this function
> + */
> +static void
> +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
> +{
> + Â Âstruct rt_private * prv = RT_PRIV(ops);
> + Â Âstruct rt_vcpu * latest_deadline_vcpu = NULL; Â Â/* lowest priority scheduled */
> + Â Âstruct rt_vcpu * iter_svc;
> + Â Âstruct vcpu * iter_vc;
> + Â Âint cpu = 0;
> + Â Âcpumask_t not_tickled;
> + Â Âcpumask_t *online;
> +
> + Â Âif ( new == NULL || is_idle_vcpu(new->vcpu) )
> + Â Â Â Âreturn;
> +
> + Â Â> > + Â Âcpumask_and(&not_tickled, online, &prv->cpus);
> + Â Âcpumask_and(&not_tickled, &not_tickled, new->vcpu->cpu_hard_affinity);
> + Â Âcpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
> +
> + Â Â/* 1) if new's previous cpu is idle, kick it for cache benefit */
> + Â Âif ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
> + Â Â{
> + Â Â Â Âcpumask_set_cpu(new->vcpu->processor, &prv->tickled);
> + Â Â Â Âcpu_raise_softirq(new->vcpu->processor, SCHEDULE_SOFTIRQ);
> + Â Â Â Âreturn;
> + Â Â}
> +
> + Â Â/* 2) if there are any idle pcpu, kick it */
> + Â Â/* The same loop also find the one with lowest priority */
> + Â Âfor_each_cpu(cpu, &not_tickled)
> + Â Â{
> + Â Â Â Âiter_vc = curr_on_cpu(cpu);
> + Â Â Â Âif ( is_idle_vcpu(iter_vc) )
> + Â Â Â Â{
> + Â Â Â Â Â Âcpumask_set_cpu(cpu, &prv->tickled);
> + Â Â Â Â Â Âcpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
> + Â Â Â Â Â Âreturn;
> + Â Â Â Â}
> + Â Â Â Âiter_svc = RT_VCPU(iter_vc);
> + Â Â Â Âif ( latest_deadline_vcpu == NULL ||
> + Â Â Â Â Â Â iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
> + Â Â Â Â Â Âlatest_deadline_vcpu = iter_svc;
> + Â Â}
> +
> + Â Â/* 3) candicate has higher priority, kick out the lowest priority vcpu */
> + Â Âif ( latest_deadline_vcpu != NULL && new->cur_deadline < latest_deadline_vcpu->cur_deadline )
> + Â Â{
> + Â Â Â Âcpumask_set_cpu(latest_deadline_vcpu->vcpu->processor, &prv->tickled);
> + Â Â Â Âcpu_raise_softirq(latest_deadline_vcpu->vcpu->processor, SCHEDULE_SOFTIRQ);
> + Â Â}
> + Â Âreturn;
> +}
> +
> +/*
> + * Should always wake up runnable vcpu, put it back to RunQ.
> + * Check priority to raise interrupt
> + * The lock is already grabbed in schedule.c, no need to lock here
> + * TODO: what if these two vcpus belongs to the same domain?
> + */
> +static void
> +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> +{
> + Â Âstruct rt_vcpu * const svc = RT_VCPU(vc);
> + Â Âs_time_t diff;
> + Â Âs_time_t now = NOW();
> + Â Âlong count = 0;
> + Â Âstruct rt_private * prv = RT_PRIV(ops);
> + Â Âstruct rt_vcpu * snext = NULL; Â Â Â Â/* highest priority on RunQ */
> +
> + Â ÂBUG_ON( is_idle_vcpu(vc) );
> +
> + Â Âif ( unlikely(curr_on_cpu(vc->processor) == vc) )
> + Â Â Â Âreturn;
> +
> + Â Â/* on RunQ, just update info is ok */
> + Â Âif ( unlikely(__vcpu_on_runq(svc)) )
> + Â Â Â Âreturn;
> +
> + Â Â/* If context hasn't been saved for this vcpu yet, we can't put it on
> + Â Â * the Runqueue. Instead, we set a flag so that it will be put on the Runqueue
> + Â Â * After the context has been saved. */
> + Â Âif ( unlikely(test_bit(__RT_scheduled, &svc->flags)) )
> + Â Â{
> + Â Â Â Âset_bit(__RT_delayed_runq_add, &svc->flags);
> + Â Â Â Âreturn;
> + Â Â}
> +
> + Â Â/* update deadline info */
> + Â Âdiff = now - svc->cur_deadline;
> + Â Âif ( diff >= 0 )
> + Â Â{
> + Â Â Â Âcount = ( diff/MICROSECS(svc->period) ) + 1;
> + Â Â Â Âsvc->cur_deadline += count * MICROSECS(svc->period);
> + Â Â Â Âsvc->cur_budget = svc->budget * 1000;
> + Â Â}
> +
> + Â Â__runq_insert(ops, svc);
> + Â Â__repl_update(ops, now);
> + Â Âsnext = __runq_pick(ops, prv->cpus); Â Â/* pick snext from ALL valid cpus */
> + Â Ârunq_tickle(ops, snext);
> +
> + Â Âreturn;
> +}
> +
> +/*
> + * scurr has finished context switch, insert it back to the RunQ,
> + * and then pick the highest priority vcpu from runq to run
> + */
> +static void
> +rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
> +{
> + Â Âstruct rt_vcpu * svc = RT_VCPU(vc);
> + Â Âstruct rt_vcpu * snext = NULL;
> + Â Âstruct rt_private * prv = RT_PRIV(ops);
> + Â Âspinlock_t *lock = vcpu_schedule_lock_irq(vc);
> +
> + Â Âclear_bit(__RT_scheduled, &svc->flags);
> + Â Â/* not insert idle vcpu to runq */
> + Â Âif ( is_idle_vcpu(vc) )
> + Â Â Â Âgoto out;
> +
> + Â Âif ( test_and_clear_bit(__RT_delayed_runq_add, &svc->flags) &&
> + Â Â Â Â likely(vcpu_runnable(vc)) )
> + Â Â{
> + Â Â Â Â__runq_insert(ops, svc);
> + Â Â Â Â__repl_update(ops, NOW());
> + Â Â Â Âsnext = __runq_pick(ops, prv->cpus); Â Â/* pick snext from ALL cpus */
> + Â Â Â Ârunq_tickle(ops, snext);
> + Â Â}
> +out:
> + Â Âvcpu_schedule_unlock_irq(lock, vc);
> +}
> +
> +/*
> + * set/get each vcpu info of each domain
> + */
> +static int
> +rt_dom_cntl(
> + Â Âconst struct scheduler *ops,
> + Â Âstruct domain *d,
> + Â Âstruct xen_domctl_scheduler_op *op)
> +{
> + Â Âxen_domctl_sched_rt_params_t *local_sched;
> + Â Âstruct rt_dom * const sdom = RT_DOM(d);
> + Â Âstruct list_head *iter;
> + Â Âint vcpu_index = 0;
> + Â Âint rc = -EINVAL;
> +
> + Â Âswitch ( op->cmd )
> + Â Â{
> + Â Âcase XEN_DOMCTL_SCHEDOP_getnumvcpus:
> + Â Â Â Âop->u.rt.nr_vcpus = 0;
> + Â Â Â Âlist_for_each( iter, &sdom->vcpu )
> + Â Â Â Â Â Âvcpu_index++;
> + Â Â Â Âop->u.rt.nr_vcpus = vcpu_index;
> + Â Â Â Ârc = 0;
> + Â Â Â Âbreak;
> + Â Âcase XEN_DOMCTL_SCHEDOP_getinfo:
> + Â Â Â Â/* for debug use, whenever adjust Dom0 parameter, do global dump */
> + Â Â Â Âif ( d->domain_id == 0 )
> + Â Â Â Â Â Ârt_dump(ops);
> +
> + Â Â Â Âop->u.rt.nr_vcpus = 0;
> + Â Â Â Âlist_for_each( iter, &sdom->vcpu )
> + Â Â Â Â Â Âvcpu_index++;
> + Â Â Â Âop->u.rt.nr_vcpus = vcpu_index;
> + Â Â Â Âlocal_sched = xzalloc_array(xen_domctl_sched_rt_params_t, vcpu_index);
> + Â Â Â Âvcpu_index = 0;
> + Â Â Â Âlist_for_each( iter, &sdom->vcpu )
> + Â Â Â Â{
> + Â Â Â Â Â Âstruct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem);
> +
> + Â Â Â Â Â Âlocal_sched[vcpu_index].budget = svc->budget;
> + Â Â Â Â Â Âlocal_sched[vcpu_index].period = svc->period;
> + Â Â Â Â Â Âlocal_sched[vcpu_index].index = vcpu_index;
> + Â Â Â Â Â Âvcpu_index++;
> + Â Â Â Â}
> + Â Â Â Âcopy_to_guest(op->u.rt.vcpu, local_sched, vcpu_index);

This will clobber guest heap if vcpu_index is greater than the allocated
space.
Â
âThis is a good point! I will pass the size of the array to the kernel and check that the number of the array's elements is not smaller than the number of vcpus.
Â
ÂYou also unconditionally leak local_sched, but there is no need
for an allocation in the first place.

âI will add the xfree() after copy_to_guest().Â
I have a question: how can I avoid allocating the local_sched?Â

Â
> + Â Â Â Ârc = 0;
> + Â Â Â Âbreak;
> + Â Âcase XEN_DOMCTL_SCHEDOP_putinfo:
> + Â Â Â Âlist_for_each( iter, &sdom->vcpu )
> + Â Â Â Â{
> + Â Â Â Â Â Âstruct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem);
> +
> + Â Â Â Â Â Â/* adjust per VCPU parameter */
> + Â Â Â Â Â Âif ( op->u.rt.vcpu_index == svc->vcpu->vcpu_id )
> + Â Â Â Â Â Â{
> + Â Â Â Â Â Â Â Âvcpu_index = op->u.rt.vcpu_index;
> +
> + Â Â Â Â Â Â Â Âif ( vcpu_index < 0 )
> + Â Â Â Â Â Â Â Â Â Âprintk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d\n",
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Âvcpu_index);
> + Â Â Â Â Â Â Â Âelse
> + Â Â Â Â Â Â Â Â Â Âprintk("XEN_DOMCTL_SCHEDOP_putinfo: "
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â"vcpu_index=%d, period=%"PRId64", budget=%"PRId64"\n",
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Âvcpu_index, op->u.rt.period, op->u.rt.budget);
> +
> + Â Â Â Â Â Â Â Âsvc->period = op->u.rt.period;
> + Â Â Â Â Â Â Â Âsvc->budget = op->u.rt.budget;
> +
> + Â Â Â Â Â Â Â Âbreak;
> + Â Â Â Â Â Â}
> + Â Â Â Â}
> + Â Â Â Ârc = 0;
> + Â Â Â Âbreak;
> + Â Â}
> +
> + Â Âreturn rc;
> +}
> +
> +static struct rt_private _rt_priv;
> +
> +const struct scheduler sched_rt_def = {
> +  Â.name      = "SMP RT Scheduler",
> +  Â.opt_name    = "rt",

Should these names reflect RT_DS as opposed to simply RT?

âDS (Deferrable Server) is just one kind of server mechanisms for global Earliest Deadline First scheduling. âWe can add other server mechanisms in the same file sched_rt.c to extend this Real Time scheduler. But we don't want to change/affect user's interface when we add more server mechanisms.

The .opt_name will affect the user's interface when user choose the rt scheduler, If we change it to rt_ds, we will have to change it to rt again when we have more server mechanisms implemented. Then users will have to change their configuration (i.e., the command line value sched=) to the new name rt. Because this could potentially affect users' interface, I think it's better to use rt here. What do you think?

Â

> +  Â.sched_id    = XEN_SCHEDULER_RT_DS,
> +  Â.sched_data   = &_rt_priv,
> +
> + Â Â.dump_cpu_state = rt_dump_pcpu,
> + Â Â.dump_settings Â= rt_dump,
> +  Â.init      = rt_init,
> +  Â.deinit     = rt_deinit,
> +  Â.alloc_pdata  Â= rt_alloc_pdata,
> +  Â.free_pdata   = rt_free_pdata,
> + Â Â.alloc_domdata Â= rt_alloc_domdata,
> +  Â.free_domdata  = rt_free_domdata,
> +  Â.init_domain  Â= rt_dom_init,
> + Â Â.destroy_domain = rt_dom_destroy,
> +  Â.alloc_vdata  Â= rt_alloc_vdata,
> +  Â.free_vdata   = rt_free_vdata,
> +  Â.insert_vcpu  Â= rt_vcpu_insert,
> +  Â.remove_vcpu  Â= rt_vcpu_remove,
> +
> +  Â.adjust     = rt_dom_cntl,
> +
> +  Â.pick_cpu    = rt_cpu_pick,
> +  Â.do_schedule  Â= rt_schedule,
> +  Â.sleep     Â= rt_vcpu_sleep,
> +  Â.wake      = rt_vcpu_wake,
> + Â Â.context_saved Â= rt_context_saved,
> +};
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e9eb0bc..f2df400 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -68,6 +68,7 @@ static const struct scheduler *schedulers[] = {
> Â Â Â&sched_sedf_def,
> Â Â Â&sched_credit_def,
> Â Â Â&sched_credit2_def,
> + Â Â&sched_rt_def,
> Â Â Â&sched_arinc653_def,
> Â};
>
> @@ -1092,7 +1093,8 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
>
> Â Â Âif ( (op->sched_id != DOM2OP(d)->sched_id) ||
> Â Â Â Â Â ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
> - Â Â Â Â Â(op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
> + Â Â Â Â Â(op->cmd != XEN_DOMCTL_SCHEDOP_getinfo) &&
> + Â Â Â Â Â(op->cmd != XEN_DOMCTL_SCHEDOP_getnumvcpus)) )
> Â Â Â Â Âreturn -EINVAL;
>
> Â Â Â/* NB: the pluggable scheduler code needs to take care
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 5b11bbf..8d4b973 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -339,6 +339,18 @@ struct xen_domctl_max_vcpus {
> Âtypedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
> ÂDEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>
> +/*
> + * This structure is used to pass to rt scheduler from a
> + * privileged domain to Xen
> + */
> +struct xen_domctl_sched_rt_params {
> + Â Â/* get vcpus' info */
> + Â Âint64_t period; /* s_time_t type */
> + Â Âint64_t budget;
> +  Âint   index;

Index is clearly an unsigned quantity. ÂFor alignment and compatibility,
uint64_t would make the most sense. ÂAlternatively, uint32_t and an
explicit uint32_t pad field.

âAgree. I have changed it to uint16_t because the vcpu_index is uint16_t in the tool stack.

Thank you very much for your comments and suggestions! Looking forward to your reply! ;-)

Best,

Mengâ

--


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.