Xen project Mailing List

Re: [Xen-devel] [PATCH] scheduler rate controller

On Mon, Oct 24, 2011 at 4:36 AM, Lv, Hui <hui.lv@xxxxxxxxx> wrote: > > As one of the topics presented in Xen summit2011 in SC, we proposed one > method scheduler rate controller (SRC) to control high frequency of > scheduling under some conditions. You can find the slides at > http://www.slideshare.net/xen_com_mgr/9-hui-lvtacklingthemanagementchallengesofserverconsolidationonmulticoresystems > > In the followings, we have tested it with 2-socket multi-core system with > many rounds and got the positive results and improve the performance greatly > either with the consolidation workload SPECvirt_2010 or some small workloads > such as sysbench and SPECjbb. So I posted it here for review. > > >From Xen scheduling mechanism, hypervisor kicks related VCPUs by raising > >schedule softirq during processing external interrupts. Therefore, if the > >number of IRQ is very large, the scheduling happens more frequent. Frequent > >scheduling will > 1) bring more overhead for hypervisor and > 2) increase cache miss rate. > > In our consolidation workloads, SPECvirt_sc2010, SR-IOV & iSCSI solution are > adopted to bypass software emulation but bring heavy network traffic. > Correspondingly, 15k scheduling happened per second on each physical core, > which means the average running time is very short, only 60us. We proposed > SRC in XEN to mitigate this problem. > The performance benefits brought by this patch is very huge at peak > throughput with no influence when system loads are low. > > SRC improved SPECvirt performance by 14%. > 1)It reduced CPU utilization, which allows more load to be added. > 2)Response time (QoS) became better at the same CPU %. > 3)The better response time allowed us to push the CPU % at peak performance > to an even higher level (CPU was not saturated in SPECvirt). > SRC reduced context switch rate significantly, resulted in > 2)Smaller Path Length > 3)Less cache misses thus lower CPI > 4)Better performance for both Guest and Hypervisor sides. > > With this patch, from our SPECvirt_sc2010 results, the performance of xen > catches up the other open sourced hypervisor. Hui, Thanks for the patch, and the work you've done testing it. There are a couple of things to discuss. * I'm not sure I like the idea of doing this at the generic level than at the specific scheduler level -- e.g., inside of credit1. For better or for worse, all aspects of scheduling work together, and even small changes tend to have a significant effect on the emergent behavior. I understand why you'd want this in the generic scheduling code; but it seems like it would be better for each scheduler to implement a rate control independently. * The actual algorithm you use here isn't described. It seems to be as follows (please correct me if I've made a mistake reverse-engineering the algorithm): Every 10ms, check to see if there have been more than 50 schedules. If so, disable pre-emption entirely for 10ms, allowing processes to run without being interrupted (unless they yield). It seems like we should be able to do better. For one, it means in the general case you will flip back and forth between really frequent schedules and less frequent schedules. For two, turning off preemption entirely will mean that whatever vcpu happens to be running could, if it wished, run for the full 10ms; and which one got elected to do that would be really random. This may work well for SPECvirt, but it's the kind of algorithm that is likely to have some workloads on which it works very poorly. Finally, there's the chance that this algorithm could be "gamed" -- i.e., if a rogue VM knew that most other VMs yielded frequently, it might be able to arrange that there would always be more than 50 context switches a second, while it runs without preemption and takes up more than its fair share. Have you tried just making it give each vcpu a minimum amount of scheduling time, say, 500us or 1ms? Now a couple of stylistic comments: * src tends to make me think of "source". I think sched_rate[_*] would fit the existing naming convention better. * src_controller() shouldn't call continue_running() directly. Instead, scheduler() should call src_controller(); and only call sched->do_schedule() if src_controller() returns false (or something like that). * Whatever the algorithm is should have comments describing what it does and how it's supposed to work. * Your patch is malformed; you need to have it apply at the top level, not from within the xen/ subdirectory. The easiest way to get a patch is to use either mercurial queues, or "hg diff". There are some good suggestions for making and posting patches here: http://wiki.xensource.com/xenwiki/SubmittingXenPatches Thanks again for all your work on this -- we definitely want Xen to beat the other open-source hypervisor. :-) -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.