[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Design RFC] Towards work-conserving RTDS scheduler



On Thu, 2016-08-04 at 01:15 -0400, Meng Xu wrote:
> Hi Dario,
> 
Hi,

> I'm thinking about changing the current RTDS scheduler to
> work-conserving version as we briefly discussed before.
> Below is a design of the work-conserving RTDS.
> I'm hoping to get your feedback about the design ideas first before I
> start writing it in code.
> 
Here I am, sorry for the delay.

> I think the code change should not be a lot as long as we don't
> provide the functionality of switching between work-conserving and
> non-work-conserving. Because the following design will keep the
> real-time property of the current RTDS scheduler, I don't see the
> reason why we should let users switch to non-work-conserving version.
> :-)
> 
Oh, but there's a bit one: _money_! :-O

If you're a service/cloud provided you may or may not want that a
customers that pays for a 40% utilization VM to be able to use more
than that. In particular, you may want to ask more money to them, in
order to enable that possibility! :-P

Anyway, I don't think --with this design of yours-- that it is such a
big deal to make it possible to switch work-conserving*ness on and off
(see below). Actually, I think it's even possible to to that on a per-
vcpu basis, which I think would be quite cool!

> --- Below is the design ---
> 
> [...]
>
> *** Requirement of the work-conserving RTDS scheduler ***
> 1) The new RTDS scheduler should be work-conserving, of course.
> 2) The new RTDS scheduler should not break any real-time guarantee
> provided by the current RTDS scheduler.
> 
> *** Design of  Work-Conserving RTDS Scheduler ***
> VCPU model
> 1) (Period, Budget): Guaranteed <Budget> time for each <Period>
> 2) Priority index: It indicates the current  priority level of the
> VCPU. When a VCPU’s budget is depleted in the current period, its
> priority index will increase by 1 and its budget will be replenished.
> 3) A VCPU’s budget and priority index will be reset at the beginning
> of each period
> 
Ok, I think I see what you mean and it looks to make sense to me.

Just one question/observation. As you know, I come from a CBS mindset.
CBS postpones a task/vcpu's deadline when it runs out of budget, and it
can, natively, work in work conserving or non-work conserving mode
(just by wither continue to consider the vcpu runnable, with the later
deadline which mean demoted priority, or block it until the next
period, respectively).

The nice thing about this is that the scheduling analysis that has been
developed works for both modes. Of course, what it says is that you can
only guarantee to each vcpu the reserved utilization, and you should
not rely on the additional capacity that you may be getting because
you're in work conserving mode and the system happened to be idle for a
few time this or that other period (so, very similar to what you're
proposing). _HOWEVER_, there are evolutions of CBS (called GRUB and
SHRUB, I'm sure you'll be able to find the papers), where the 'unused
bandwidth' (i.e., the otherwise idle time that you're making use of iff
you're in work conserving mode) is distributed in a precise way
(according to some weights, IIRC) to the various vcpus, hence making
scheduling analysis both possible and useful again.

Now, I'm not at all saying that we (you! :-D) should RTDS into using
CBS(ish) or anything like that. I'm just thinking out loud and
wondering:
 - could it be useful to have a scheduling analysis in place for the 
   scheduler in work conserving mode (one, of course, that takes into 
   account and give guarantees on the otherwise idle bandwidth... I 
   know that the existing one holds! :-P) ?
 - if yes, do you already have one --or do you think it will be 
   possible to develop one-- for your priority-index based model?

Note that I'm not saying you should, and I'd be perfectly fine with a
"no analysis, but let's keep things simple for now"... This just came
to my mind, and I'm just pointing it ouy, to make sure we consider and
think about it, and make a conscious decision.

> Scheduling policy: modified gEDF
> 1) Priority comparison:
>    a) VCPUs with lower priority index has higher priority than VCPUs
> with higher priority index
>    b) VCPUs with same priority index uses gEDF policy to decide the
> priority order
> 2) Scheduling point
>    a) VCPU’s budget is depleted for the current priority index
>    b) VCPU starts a new period
>    c) VCPU is blocked or waked up
> 3) Scheduling decision is made when scheduler is invoked
>     a) Always pick the current M highest-priority VCPUs to run on the
> M cores.
> 
So, still about the analysis point above, and just out of the top of my
head (and without being used to do this things any longer!!), it looks
like it's possible think at some analysis for this.

In fact, since:
 - vcpus with different priority indexes are totally disjoint sets,
 - there's a strict ordering between priority indexes,
 - vcpus sort of use their scheduling parameters at each priority index

This looks to me like vcpus are subject to a "hierarchy" of RTDS
schedulers, the one at level x+1 running in the idle time of the one at
level x... And I think there's scope for writing down some maths
formulas that model this situation. :-)

Actually, it's quite likely that you either have already noticed this
and done the analysis, or that someone else in literature has done
something similar --maybe with other schedulers-- before.

Anyway, the idea itself looks fair enough to me. I'd like to hear, if
that's fine with you, how you plan to actually implement it, as there
of course are multiple different ways to do it, and there are, IMO, a
couple of things that should be kept in mind.

Finally, about the work-conserving*ness on-off switch, what added
difficulty or increase in code complexity prevents us to, instead of
this:

"2) Priority index: It indicates the current  priority level of the
    VCPU. When a VCPU’s budget is depleted in the current period, its
    priority index will increase by 1 and its budget will be
    replenished."

do something like this:

"2) Priority index: It indicates the current  priority level of the
    VCPU. When a VCPU's budget is depleted in the current period:
     2a) if the VCPU has the work conserving flag set, its priority
         index will be increased by 1, and its budget replenished;
     2b) if the VCPU has the work conserving flag cleat, it's blocked
         until next period."

?

Thanks and Regards,
Dario
--- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.