[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler


  • To: "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • From: Keir Fraser <keir@xxxxxxx>
  • Date: Fri, 05 Nov 2010 08:07:04 +0000
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>
  • Delivery-date: Fri, 05 Nov 2010 01:08:23 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:user-agent:date:subject:from:to:cc:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; b=vbAafvd0gw0IiAser5XIth8hhxrhS5gVnErgl0okM58DanrzJhgHlDVEl6v+wZPAK1 UlSYx6arejwy463l9sTJTMkHLEUokwrLq9LcVwVJBlXBwiwt0G+STinqyFvxo34H+J+w Z8vcVamepqawsNBqOTXkvBD/HyNXUufYOEF5Q=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: Act8t/OXZ6bLDSZ9SFmoF+2ChyNA1QAATszAAAHP70k=
  • Thread-topic: [Xen-devel] RE: The caculation of the credit in credit_scheduler

On 05/11/2010 07:26, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote:

> Maybe idlers shouldn't produce the credits at the calcuation points.  I did an
> experiment before, it can reduce the unfaireness if idlers not producing
> credit. 
> 
> Except this issue, I also have findings and want to share them with you guys
> to get more input about credit scheduler.
> 
> 1. Interrupt delivery for assiged devices is done in a tasklet and the tasklet
> is running in the idle vcpu's context, but scheduler's behavior for scheduling
> idle vcpu looks very strange. Ideally, when switch to idle vcpu for executing
> tasklet, the previous vcpu should be switch back after tasklet is done, but
> current policy is to choose another vcpu in runq.  That is to say, one
> interrupt happens on one CPU, the CPU may do a real task switch, it maybe not
> acceptable when interrupt frequency is high and also introduce some
> performance bugs according to our experiments.  Even if we can switch back the
> previous vcpu after executing tasklet, how to determine its timeslice for its
> next run is also a key issue and this is not addressed. If still give 30ms for
> its restart run, it may trigger some fairness issues, I think.

Interrupt delivery is a victim of us switching tasklet implementation to
work in idle VCPU context instead of in softirq context. It might be
sensible to make use of softirqs directly from the interrupt-delivery logic,
or introduce a second type of tasklets (built on softirqs), or perhaps we
can think of a way to structure interrupt delivery that doesn't need softirq
context at all -- that would be nice! What did we need softirq context for
in the first place?

 -- Keir

> 2.  Another issue is found during our experiments and this is a very
> interesting issue(likely to be a bug).  In the experiment, we pinned three
> guests(two cpu-intensive and one IO-intensive) on two logical processors
> firstly, and each guest is configured with two virtual CPUs, and the CPU
> utilization share is ~90% for each CPU intensive guest and ~20% for
> IO-intensive guest.  But the magic thing happens after we introducing an
> addition idle guest which doesn't do real worload and just does idle.  The CPU
> utilization share is changed : ~50% for each CPU-intensive guest and ~100% for
> the IO-intensive  guest.  After analying the scheduling data, we found the
> change is from virtual timer interrupt delivery to the idle guest. Although
> the guest is idle, but there are still 1000 timer interrupts for each vcpu in
> one second. Current credit scheduler will boost the idle vcpu from the blocked
> state and trigger 1000 schedule events in the target physical processor, and
> the IO-intensive guest maybe benefit from the frequent schedule events and get
> more CPU utilization share.  The more magic thing is that after 'xm pause' and
> 'xm unpause' the idle guest,  the each of the three guests are all allocated
> with ~66% CPU share.
> This finding tells us some facts:  (1)  current credit scheduler is not fair
> to IO-intensive guests. (2) IO-intensive guests have the ability to acquire
> fair CPU share when competing with CPU-intensive guests. (3) Current timeslice
> (30ms) is meaningless, since the average timeslice is far smaller than 1ms
> under real workloads(This may bring performance issues). (4) boost mechanism
> is too aggressive and idle guest shouldn't be boosted when it is waken from
> halt state.  (5)  There is no policy in credit to determine how
> long the boosted vcpu can run ,and how to handle the preempted vcpu .
> 
> 3.  Credit is not really used for determining key scheduling policies. For
> example, when choose candidate task, credit is not well used to evaluate
> tasks' priority, and this maybe not fair to IO-intensive guest. Additionally,
> task's priority is not caculated in time and just is updated every 30ms. In
> this case, even if one task's credit is minus, its prioirty maybe still
> TS_UNDER or TS_BOOST due to delayed update, so maybe when the vcpu is
> scheduled out, its priority should be updated after credit change.  In
> addition, when a boosted vCPU is scheduled out, its priority is always set to
> TS_UNDER, and credit is not considered as well. If the credit becomes minus,
> it maybe better to set the priority to TS_OVER?.
> 
> Any comments ? 
> 
> Xiantao
> 
> 
> Jiang, Yunhong wrote:
>> When reading the credit scheduler code and doing experiment, I notice
>> one thing interesting in current credit scheduler. For example, in
>> following situation:
>> 
>> Hardware:
>> A powerful system with 64 CPUs.
>> 
>> Xen Environment:
>> Dom0 with 8 vCPU bound to CPU (0, 16~24)
>> 
>> 3 HVM domain, all with 2 vCPUS, all bound as vcpu0->pcpu1,
>> vcpu1->pcpu2. Among them, 2 are CPU intensive while 1 is I/O
>> intensive.  
>> 
>> The result shows that the I/O intensive domain will occupy more than
>> 100% cpu, while the two cpu intensive domain each occupy 50%.
>> 
>> IMHO it should be 66% for all domain.
>> 
>> The reason is how the credit is caculated. Although the 3 HVM domains
>> is pinned to 2 PCPU and share the 2 CPUs, they will all get 2* 300
>> credit when credit account. That means the I/O intensive HVM domain
>> will never be under credit, thus it will preempt the CPU intensive
>> whenever it is boost (i.e. after I/O access to QEMU), and it is set
>> to be TS_UNDER only at the tick time, and then, boost again.
>> 
>> I'm not sure if this is meaningful usage model and need fix, but I
>> think it is helpful to show this to the list.
>> 
>> I didn't try credit2, so no idea if this will happen to credit2 also.
>> 
>> Thanks
>> --jyh
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.