[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] A question on the credit scheduler



On Mon, Dec 19, 2011 at 4:37 AM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:
2011/12/17 gavin <gbtux@xxxxxxx>:
>
> At 2011-12-16 23:58:26,"George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:
>>2011/12/16 gavin <gbtux@xxxxxxx>:
>>> At 2011-12-16 19:04:19,"George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:
>>>
>>>>2011/12/16 zhikai <gbtux@xxxxxxx>:
>>>>> Hi All,
>>>>>
>>>>> In the credit scheduler, the scheduling decision function csched_schedule()
>>>>> is called in the schedule function in scheduler.c, such as the following.
>>>>> next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
>>>>>
>>>>> But, how often the csched_schedule() is called and to run? Does this
>>>>> frequency have something to do with the slice of credit scheduler that is
>>>>> 30ms?
>>>>
>>>>The scheduler runs whenever the SCHEDULE_SOFTIRQ is raised.  If you
>>>>grep through the source code fro that string, you can find all the
>>>>places where it's raised.
>>>>
>>>>Some examples include:
>>>>* When the 30ms timeslice is finished
>>>>* When a sleeping vcpu of higher priority than what's currently running wakes up
>>>>* When a vcpu blocks
>>>>* When a vcpu is migrated from one cpu to another
>>>>
>>>>30ms is actually a pretty long time; in typical workloads, vcpus block
>>>>or are preempted by other waking vcpus without using up their full
>>>>timeslice.
>>>
>>> Thank you very much for your reply.
>>>
>>> So, the vcpu is very likely to be preempted whenever the SCHEDULE_SOFTIRQ is
>>> raised.
>>
>>It depends; if you have a cpu-burning vcpu running on a cpu all by
>>itself, then after its 30ms timeslice, Xen will have no one else to
>>run, and so will let it run again.
>>
>>But yes, if there are other vcpus on the runqueue, or the host is
>>moderately busy, it's likely that SCHEDULE_SOFTIRQ will cause a
>>context-switch.
>>
>>> And we cannot find a small timeslice, such as a(ms), which makes the
>>> time any vcpu spending on running phase is k*a(ms), k is integer here. There
>>> is no such a small timeslice. Is it right?
>>
>>I'm sorry, I don't really understand your question.  Perhaps if you
>>told me what you're trying to accomplish?
>
> I try to describe my idea as the following clearly. But I really don't know
> if it will work. Please give me some advice if possible.
>
> According to the credit scheduler in Xen, a vCPU can run a 30ms timeslice
> when it is scheduled on the physical CPU. And, a vCPU with the BOOST
> priority will preempt the running one and run additional 10ms. So, what I
> think is if we monitor the physical CPU every 10ms and we can get the
> mapping information of a physical CPU and a vCPU. And also, we can get the
> un-mapping information that a physical CPU isn’t mapped to any vCPU. Thus,
> we can get the CPU usage by calculating the proportion of the mapping
> information to the total time when we monitored.
>
> For example, if we monitor the physical CPUs every 10ms and we can get 100
> pairs of pCPU and vCPU in a second, such as (pCPU_id, vCPU_id). If there is
> 60 mapping pairs that the pCPU is mapped to a valid vPCU, and 40 un-mapping
> pairs that we cannot find the pCPU to be mapped a valid vCPU. So, we can get
> the usage of the physical CPUs that is 60%.
>
> Here, we monitor the physical CPUs every 10ms. We also can monitor them once
> less than the 10ms interval, such as 1ms interval. Whatever interval we
> choose, we must make sure no CPU content switch in the interval or the
> context switch always occur at the edge of interval. Only in this condition,
> can this idea work.
>
> So, I am not sure whether we can find such a time interval that can meet
> this condition. In other words, whether we can find such a time interval
> that ensures all the CPU content switch occur at the edge of interval.

You still haven't described exactly what it is you're trying to
accomplish: what is your end goal?  It seems to be related somehow to
measuring how busy the system is (i.e., the number of active pcpus and
idle pcpus); but as I don't know what you want to do with that
information, I can't tell you the best way to get it.

Regarding a map of pcpus to vcpus, that already exists.  The
scheduling code will keep track of the currently running vcpu here:
 per_cpu(schedule_data, pcpu_id).curr

You can see examples of the above structure used in
xen/common/sched_credit2.c.  If "is_idle(per_cpu(schedule_data,
pcpu).curr)" is false, then the cpu is running a vcpu; if it is true,
then the pcpu is idle (although it may be running a tasklet).

Additionally, if all you want is the number of non-idle cpus, the
credit1 scheduler keeps track of the idle and non-idle cpus in
prv->idlers.  You could easily use "cpumask_weight(&prv->idlers)" to
find out how many idle cpus there are at any given time.  If you know
how many online cpus there are, that will give you the busy-ness of
the system.

So now that you have this instantaneous percentage, what do you want
to do with it?


A tangential question:
 When you pin a pcpu to a vcpu (e.g. xm vcpu-pin 0 0 0),  are the soft irqs
for that cpu still raised ? (Lets assume for the sake of simplicity that there
are 2 cpus in the system and 2 domains - a dom0 and a domU, each pinned
to one CPU).
 Do the vcpu pauses (and subsequent resumes with no context switch etc)
still happen due to the irqs or the scheduler code? Or will the scheduler
be effectively disabled in this scenario ?

shriram


 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.