[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] scheduler rate controller



As one of the topics presented in Xen summit2011 in SC, we proposed one method 
scheduler rate controller (SRC) to control high frequency of scheduling under 
some conditions. You can find the slides at 
http://www.slideshare.net/xen_com_mgr/9-hui-lvtacklingthemanagementchallengesofserverconsolidationonmulticoresystems

In the followings, we have tested it with 2-socket multi-core system with many 
rounds and got the positive results and improve the performance greatly either 
with the consolidation workload SPECvirt_2010 or some small workloads such as 
sysbench and SPECjbb. So I posted it here for review.

>From Xen scheduling mechanism, hypervisor kicks related VCPUs by raising 
>schedule softirq during processing external interrupts. Therefore, if the 
>number of IRQ is very large, the scheduling happens more frequent. Frequent 
>scheduling will 
1) bring more overhead for hypervisor and 
2) increase cache miss rate. 

In our consolidation workloads, SPECvirt_sc2010, SR-IOV & iSCSI solution are 
adopted to bypass software emulation but bring heavy network traffic. 
Correspondingly, 15k scheduling happened per second on each physical core, 
which means the average running time is  very short, only 60us. We proposed SRC 
in XEN to mitigate this problem. 
The performance benefits brought by this patch is very huge at peak throughput 
with no influence when system loads are low.

SRC improved SPECvirt performance by 14%. 
1)It reduced CPU utilization, which allows more load to be added.
2)Response time (QoS)  became better at the same CPU %.
3)The better response time allowed us to push the CPU % at peak performance to 
an even higher level (CPU was not saturated in SPECvirt).
SRC reduced context switch rate significantly, resulted in 
2)Smaller Path Length
3)Less cache misses thus lower CPI
4)Better performance for both Guest and Hypervisor sides.

With this patch, from our SPECvirt_sc2010 results, the performance of xen 
catches up the other open sourced hypervisor. 


Signed-off-by: Hui Lv hui.lv@xxxxxxxxx


diff -ruNp xen.org/common/schedule.c xen/common/schedule.c
--- xen.org/common/schedule.c   2011-10-20 03:29:44.000000000 -0400
+++ xen/common/schedule.c       2011-10-23 21:41:14.000000000 -0400
@@ -98,6 +98,31 @@ static inline void trace_runstate_change
 
     __trace_var(event, 1/*tsc*/, sizeof(d), &d);
 }
+/*
+ *opt_sched_rate_control:  parameter to turn on/off  scheduler rate controller 
(SRC)
+ *opt_sched_rate_high: scheduling frequency threshold, default value is 50.
+
+ *Suggest to set the value of opt_sched_rate_high larger than 50.
+ *It means if the scheduling frequency number, calculated during 
SCHED_SRC_INTERVAL (default 10 millisecond), is larger than 
opt_sched_rate_high, SRC works.
+*/
+bool_t opt_sched_rate_control = 0;
+unsigned int opt_sched_rate_high = 50;           
+boolean_param("sched_rate_control", opt_sched_rate_control);
+integer_param("sched_rate_high", opt_sched_rate_high);
+
+
+/* The following function is the scheduling rate controller (SRC). It is 
triggered when
+ * the frequency of scheduling is excessive high. (larger than 
opt_sched_rate_high)
+ * 
+ * Rules to control the scheduling frequency
+ * 1)if the frequency of scheduling (sd->s_csnum), calculated during the 
period of SCHED_SRC_INTERVAL,
+ * is larger than the threshold opt_sched_rate_high, SRC is enabled to work by 
setting sd->s_src_control = 1
+ * 2)if SRC works, it returns previous vcpu directly if previous vcpu is still 
runnalbe and not the idle vcpu.
+ * This method can decrease the frequency of scheduling when the scheduling 
frequency is excessive.
+*/
+
+void src_controller(struct schedule_data *sd, struct vcpu *prev, s_time_t now);
+
 
 static inline void trace_continue_running(struct vcpu *v)
 {
@@ -1033,6 +1058,29 @@ static void vcpu_periodic_timer_work(str
     set_timer(&v->periodic_timer, periodic_next_event);
 }
 
+void src_controller(struct schedule_data *sd, struct vcpu *prev, s_time_t now)
+{
+    sd->s_csnum++;
+    if ((now - sd->s_src_loop_begin) >= MILLISECS(SCHED_SRC_INTERVAL))
+    {
+        if (sd->s_csnum >= opt_sched_rate_high)
+               sd->s_src_control = 1;
+        else
+               sd->s_src_control = 0;
+        sd->s_src_loop_begin = now;
+        sd->s_csnum = 0;
+    }
+    if (sd->s_src_control)
+    {
+       if (!is_idle_vcpu(prev) && vcpu_runnable(prev))
+       {
+           perfc_incr(sched_src);
+           return continue_running(prev);
+       }
+       perfc_incr(sched_nosrc);
+    }
+}
+
 /* 
  * The main function
  * - deschedule the current domain (scheduler independent).
@@ -1054,6 +1102,8 @@ static void schedule(void)
 
     sd = &this_cpu(schedule_data);
 
+    if (opt_sched_rate_control)
+        src_controller(sd,prev,now);
     /* Update tasklet scheduling status. */
     switch ( *tasklet_work )
     {
@@ -1197,6 +1247,9 @@ static int cpu_schedule_up(unsigned int 
     sd->curr = idle_vcpu[cpu];
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
+    sd->s_csnum=0;
+    sd->s_src_loop_begin=NOW();
+    sd->s_src_control=0;
 
     /* Boot CPU is dealt with later in schedule_init(). */
     if ( cpu == 0 )
diff -ruNp xen.org/include/xen/perfc_defn.h xen/include/xen/perfc_defn.h
--- xen.org/include/xen/perfc_defn.h    2011-10-20 03:29:44.000000000 -0400
+++ xen/include/xen/perfc_defn.h        2011-10-23 21:08:28.000000000 -0400
@@ -15,6 +15,8 @@ PERFCOUNTER(ipis,                   "#IP
 PERFCOUNTER(sched_irq,              "sched: timer")
 PERFCOUNTER(sched_run,              "sched: runs through scheduler")
 PERFCOUNTER(sched_ctx,              "sched: context switches")
+PERFCOUNTER(sched_src,              "sched: src triggered")
+PERFCOUNTER(sched_nosrc,            "sched: src not triggered")
 
 PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
 PERFCOUNTER(schedule,               "csched: schedule")
diff -ruNp xen.org/include/xen/sched-if.h xen/include/xen/sched-if.h
--- xen.org/include/xen/sched-if.h      2011-10-20 03:29:44.000000000 -0400
+++ xen/include/xen/sched-if.h  2011-10-23 21:20:57.000000000 -0400
@@ -15,6 +15,11 @@ extern struct cpupool *cpupool0;
 
 /* cpus currently in no cpupool */
 extern cpumask_t cpupool_free_cpus;
+/*SRC judge whether to trigger scheduling controller based on the comparison 
+ *between the scheduling frequency, counted during SCHED_SRC_INTERVAL, and the 
threshold opt_sched_rate_high
+ *Suggest to set SCHED_SRC_INTERVAL to 10 (millisecond)
+*/
+#define SCHED_SRC_INTERVAL      10
 
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
@@ -32,6 +37,9 @@ struct schedule_data {
     struct vcpu        *curr;           /* current task                    */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
+    int                 s_csnum;               /* scheduling number based on 
last period */
+    s_time_t            s_src_loop_begin;       /* SRC conting start point */
+    bool_t              s_src_control;         /*indicate whether src should 
be triggered */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
 };

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.