Xen project Mailing List

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

To: "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

From: Keir Fraser <keir@xxxxxxx>

Date: Fri, 05 Nov 2010 08:07:04 +0000

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>

Delivery-date: Fri, 05 Nov 2010 01:08:23 -0700

Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:user-agent:date:subject:from:to:cc:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; b=vbAafvd0gw0IiAser5XIth8hhxrhS5gVnErgl0okM58DanrzJhgHlDVEl6v+wZPAK1 UlSYx6arejwy463l9sTJTMkHLEUokwrLq9LcVwVJBlXBwiwt0G+STinqyFvxo34H+J+w Z8vcVamepqawsNBqOTXkvBD/HyNXUufYOEF5Q=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: Act8t/OXZ6bLDSZ9SFmoF+2ChyNA1QAATszAAAHP70k=

Thread-topic: [Xen-devel] RE: The caculation of the credit in credit_scheduler

On 05/11/2010 07:26, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote: > Maybe idlers shouldn't produce the credits at the calcuation points. I did an > experiment before, it can reduce the unfaireness if idlers not producing > credit. > > Except this issue, I also have findings and want to share them with you guys > to get more input about credit scheduler. > > 1. Interrupt delivery for assiged devices is done in a tasklet and the tasklet > is running in the idle vcpu's context, but scheduler's behavior for scheduling > idle vcpu looks very strange. Ideally, when switch to idle vcpu for executing > tasklet, the previous vcpu should be switch back after tasklet is done, but > current policy is to choose another vcpu in runq. That is to say, one > interrupt happens on one CPU, the CPU may do a real task switch, it maybe not > acceptable when interrupt frequency is high and also introduce some > performance bugs according to our experiments. Even if we can switch back the > previous vcpu after executing tasklet, how to determine its timeslice for its > next run is also a key issue and this is not addressed. If still give 30ms for > its restart run, it may trigger some fairness issues, I think. Interrupt delivery is a victim of us switching tasklet implementation to work in idle VCPU context instead of in softirq context. It might be sensible to make use of softirqs directly from the interrupt-delivery logic, or introduce a second type of tasklets (built on softirqs), or perhaps we can think of a way to structure interrupt delivery that doesn't need softirq context at all -- that would be nice! What did we need softirq context for in the first place? -- Keir > 2. Another issue is found during our experiments and this is a very > interesting issue(likely to be a bug). In the experiment, we pinned three > guests(two cpu-intensive and one IO-intensive) on two logical processors > firstly, and each guest is configured with two virtual CPUs, and the CPU > utilization share is ~90% for each CPU intensive guest and ~20% for > IO-intensive guest. But the magic thing happens after we introducing an > addition idle guest which doesn't do real worload and just does idle. The CPU > utilization share is changed : ~50% for each CPU-intensive guest and ~100% for > the IO-intensive guest. After analying the scheduling data, we found the > change is from virtual timer interrupt delivery to the idle guest. Although > the guest is idle, but there are still 1000 timer interrupts for each vcpu in > one second. Current credit scheduler will boost the idle vcpu from the blocked > state and trigger 1000 schedule events in the target physical processor, and > the IO-intensive guest maybe benefit from the frequent schedule events and get > more CPU utilization share. The more magic thing is that after 'xm pause' and > 'xm unpause' the idle guest, the each of the three guests are all allocated > with ~66% CPU share. > This finding tells us some facts: (1) current credit scheduler is not fair > to IO-intensive guests. (2) IO-intensive guests have the ability to acquire > fair CPU share when competing with CPU-intensive guests. (3) Current timeslice > (30ms) is meaningless, since the average timeslice is far smaller than 1ms > under real workloads(This may bring performance issues). (4) boost mechanism > is too aggressive and idle guest shouldn't be boosted when it is waken from > halt state. (5) There is no policy in credit to determine how > long the boosted vcpu can run ,and how to handle the preempted vcpu . > > 3. Credit is not really used for determining key scheduling policies. For > example, when choose candidate task, credit is not well used to evaluate > tasks' priority, and this maybe not fair to IO-intensive guest. Additionally, > task's priority is not caculated in time and just is updated every 30ms. In > this case, even if one task's credit is minus, its prioirty maybe still > TS_UNDER or TS_BOOST due to delayed update, so maybe when the vcpu is > scheduled out, its priority should be updated after credit change. In > addition, when a boosted vCPU is scheduled out, its priority is always set to > TS_UNDER, and credit is not considered as well. If the credit becomes minus, > it maybe better to set the priority to TS_OVER?. > > Any comments ? > > Xiantao > > > Jiang, Yunhong wrote: >> When reading the credit scheduler code and doing experiment, I notice >> one thing interesting in current credit scheduler. For example, in >> following situation: >> >> Hardware: >> A powerful system with 64 CPUs. >> >> Xen Environment: >> Dom0 with 8 vCPU bound to CPU (0, 16~24) >> >> 3 HVM domain, all with 2 vCPUS, all bound as vcpu0->pcpu1, >> vcpu1->pcpu2. Among them, 2 are CPU intensive while 1 is I/O >> intensive. >> >> The result shows that the I/O intensive domain will occupy more than >> 100% cpu, while the two cpu intensive domain each occupy 50%. >> >> IMHO it should be 66% for all domain. >> >> The reason is how the credit is caculated. Although the 3 HVM domains >> is pinned to 2 PCPU and share the 2 CPUs, they will all get 2* 300 >> credit when credit account. That means the I/O intensive HVM domain >> will never be under credit, thus it will preempt the CPU intensive >> whenever it is boost (i.e. after I/O access to QEMU), and it is set >> to be TS_UNDER only at the tick time, and then, boost again. >> >> I'm not sure if this is meaningful usage model and need fix, but I >> think it is helpful to show this to the list. >> >> I didn't try credit2, so no idea if this will happen to credit2 also. >> >> Thanks >> --jyh > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.