[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Power aware credit scheduler

Existing credit scheduler is not power aware. To achieve better 
power saving ability with negligible performance impact, following 
areas may be tweaked and listed here for comments first.

Goal is not to silly save power with sacrifice of performance, e.g. 
we don't want to prevent migration when there're free cpus with 
some pending runqueues. But when free computing power is more 
than existing requirement, power aware policy can be pushed to 
choose a less power-intrusive decision. Of course even in latter 
case, it's controllable with a scheduler parameter like 
csched_private.power and exposed to user.


a) when there's more idle cpus than required

a.1) csched_cpu_pick
        Existing policy is to pick one with more idle neighbours, 
to avoid shared resource contention among cores or threads. 
However from power P.O.V, package C-state saves much more 
power than per-core C-state vehicle. From this angle, it might be 
better to keep idle package continuously idle, while picking idle 
cores/threads with busy neighbours already, if csched_private.
power is set. The performance/watt ratio is positively incremented 
though absolute performance is kicked a bit.

a.2) csched_vcpu_wake
        Similar as above, instead of blindly kick all idle cpus in 
a rush, some selective knock can be pushed with power factor


b) when physical cpu resides in idle C-state
        Avoid unnecessary work to keep longer C-state residency.
For example, accouting process (tick timer, more specifically)
can be stopped before C-state entrance and then resumed after
waking up. The point is that no accounting is required when current
cpu is idle, and any runqueue change triggering from other cpus
incurs a IPI to this cpu which effectively breaks it back to C0 
state with accounting resumed. Since the residency period may
be longer than accouting period (30ms), csched_tick should be
aware of resume event to adjust elapsed credits.


c) when cpu's freq is scaled dynamically
        When cpufreq/Px is enabled, cpu's frequency is adjusted
to different operation points driven by a on-demand governor. So
csched_acct may need take frequency difference among cpus into
consideration and total available credits won't be a simple 300 *
online cpu_number. 


Of course there're bunch of research areas to add more power
factor into scheduler policy. But above is fundamental stuff which
we believe would help scheduler understand power requirement 
and not incurs bad impact to performance/watt first.

Comments are appreciated.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.