On 05/11/2010 07:26, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote:
> Maybe idlers shouldn't produce the credits at the calcuation points. I did an
> experiment before, it can reduce the unfaireness if idlers not producing
> Except this issue, I also have findings and want to share them with you guys
> to get more input about credit scheduler.
> 1. Interrupt delivery for assiged devices is done in a tasklet and the tasklet
> is running in the idle vcpu's context, but scheduler's behavior for scheduling
> idle vcpu looks very strange. Ideally, when switch to idle vcpu for executing
> tasklet, the previous vcpu should be switch back after tasklet is done, but
> current policy is to choose another vcpu in runq. That is to say, one
> interrupt happens on one CPU, the CPU may do a real task switch, it maybe not
> acceptable when interrupt frequency is high and also introduce some
> performance bugs according to our experiments. Even if we can switch back the
> previous vcpu after executing tasklet, how to determine its timeslice for its
> next run is also a key issue and this is not addressed. If still give 30ms for
> its restart run, it may trigger some fairness issues, I think.
Interrupt delivery is a victim of us switching tasklet implementation to
work in idle VCPU context instead of in softirq context. It might be
sensible to make use of softirqs directly from the interrupt-delivery logic,
or introduce a second type of tasklets (built on softirqs), or perhaps we
can think of a way to structure interrupt delivery that doesn't need softirq
context at all -- that would be nice! What did we need softirq context for
in the first place?
> 2. Another issue is found during our experiments and this is a very
> interesting issue(likely to be a bug). In the experiment, we pinned three
> guests(two cpu-intensive and one IO-intensive) on two logical processors
> firstly, and each guest is configured with two virtual CPUs, and the CPU
> utilization share is ~90% for each CPU intensive guest and ~20% for
> IO-intensive guest. But the magic thing happens after we introducing an
> addition idle guest which doesn't do real worload and just does idle. The CPU
> utilization share is changed : ~50% for each CPU-intensive guest and ~100% for
> the IO-intensive guest. After analying the scheduling data, we found the
> change is from virtual timer interrupt delivery to the idle guest. Although
> the guest is idle, but there are still 1000 timer interrupts for each vcpu in
> one second. Current credit scheduler will boost the idle vcpu from the blocked
> state and trigger 1000 schedule events in the target physical processor, and
> the IO-intensive guest maybe benefit from the frequent schedule events and get
> more CPU utilization share. The more magic thing is that after 'xm pause' and
> 'xm unpause' the idle guest, the each of the three guests are all allocated
> with ~66% CPU share.
> This finding tells us some facts: (1) current credit scheduler is not fair
> to IO-intensive guests. (2) IO-intensive guests have the ability to acquire
> fair CPU share when competing with CPU-intensive guests. (3) Current timeslice
> (30ms) is meaningless, since the average timeslice is far smaller than 1ms
> under real workloads(This may bring performance issues). (4) boost mechanism
> is too aggressive and idle guest shouldn't be boosted when it is waken from
> halt state. (5) There is no policy in credit to determine how
> long the boosted vcpu can run ,and how to handle the preempted vcpu .
> 3. Credit is not really used for determining key scheduling policies. For
> example, when choose candidate task, credit is not well used to evaluate
> tasks' priority, and this maybe not fair to IO-intensive guest. Additionally,
> task's priority is not caculated in time and just is updated every 30ms. In
> this case, even if one task's credit is minus, its prioirty maybe still
> TS_UNDER or TS_BOOST due to delayed update, so maybe when the vcpu is
> scheduled out, its priority should be updated after credit change. In
> addition, when a boosted vCPU is scheduled out, its priority is always set to
> TS_UNDER, and credit is not considered as well. If the credit becomes minus,
> it maybe better to set the priority to TS_OVER?.
> Any comments ?
> Jiang, Yunhong wrote:
>> When reading the credit scheduler code and doing experiment, I notice
>> one thing interesting in current credit scheduler. For example, in
>> following situation:
>> A powerful system with 64 CPUs.
>> Xen Environment:
>> Dom0 with 8 vCPU bound to CPU (0, 16~24)
>> 3 HVM domain, all with 2 vCPUS, all bound as vcpu0->pcpu1,
>> vcpu1->pcpu2. Among them, 2 are CPU intensive while 1 is I/O
>> The result shows that the I/O intensive domain will occupy more than
>> 100% cpu, while the two cpu intensive domain each occupy 50%.
>> IMHO it should be 66% for all domain.
>> The reason is how the credit is caculated. Although the 3 HVM domains
>> is pinned to 2 PCPU and share the 2 CPUs, they will all get 2* 300
>> credit when credit account. That means the I/O intensive HVM domain
>> will never be under credit, thus it will preempt the CPU intensive
>> whenever it is boost (i.e. after I/O access to QEMU), and it is set
>> to be TS_UNDER only at the tick time, and then, boost again.
>> I'm not sure if this is meaningful usage model and need fix, but I
>> think it is helpful to show this to the list.
>> I didn't try credit2, so no idea if this will happen to credit2 also.
> Xen-devel mailing list
Xen-devel mailing list