[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/4] xen: credit2: implement utilization cap



On Tue, 2017-06-13 at 17:07 +0100, Anshul Makkar wrote:
> On 12/06/2017 14:19, Dario Faggioli wrote:
> > > > @@ -92,6 +92,82 @@
> > > >   */
> > > > 
> > > >  /*
> > > > + * Utilization cap:
> > > > + *
> > > > + * Setting an pCPU utilization cap for a domain means the
> > > > following:
> > > > + *
> > > > + * - a domain can have a cap, expressed in terms of % of
> > > > physical
> > > > + * For implementing this, we use the following approach:
> > > > + *
> > > > + * - each domain is given a 'budget', an each domain has a
> > > > timer,
> > > > which
> > > > + *   replenishes the domain's budget periodically. The budget
> > > > is
> > > > the amount
> > > > + *   of time the vCPUs of the domain can use every 'period';
> > > > + *
> > > > + * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same
> > > > for
> > > > all domains
> > > > + *   (but each domain has its own timer; so the all are
> > > > periodic
> > > > by the same
> > > > + *   period, but replenishment of the budgets of the various
> > > > domains, at
> > > > + *   periods boundaries, are not synchronous);
> > > > + *
> > > > + * - when vCPUs run, they consume budget. When they don't run,
> > > > they don't
> > > > + *   consume budget. If there is no budget left for the
> > > > domain, no
> > > > vCPU of
> > > > + *   that domain can run. If a vCPU tries to run and finds
> > > > that
> > > > there is no
> > > > + *   budget, it blocks.
> > > > + *   Budget never expires, so at whatever time a vCPU wants to
> > > > run, it can
> > > > + *   check the domain's budget, and if there is some, it can
> > > > use
> > > > it.
> > > > + *
> > > > + * - budget is replenished to the top of the capacity for the
> > > > domain once
> > > > + *   per period. Even if there was some leftover budget from
> > > > previous period,
> > > > + *   though, the budget after a replenishment will always be
> > > > at
> > > > most equal
> > > > + *   to the total capacify of the domain ('tot_budget');
> > > > + *
> > > 
> > > budget is replenished but credits not available ?
> > > 
> > 
> > Still not getting this.
> 
> what I want to ask is that if the budget of the domain is
> replenished, 
> but credit for the vcpus of that domain is not available, then what 
> happens.
>
Yes, but the point is that budget can be available or not, while
credits are always available. There's no such thing as credit not being
available at all.

The amount of credits each vcpu has decides which vcpu will run, in the
sense that it will be the one that has the highest amount of credits.
The others will indeed wait, but because they've got less credit than
the one that runs, not because they don't have credits available.

> I believe, vcpus won't be scheduled (even if they have budget_quota) 
> till they get their credit replenished.
>
Credits are not exhausted or replenished.

If you want to know what happens when there are two vcpus, but with
budget, and a different amount of credits (and only 1 pcpu where to run
them), that is: the one with more credits runs.

> > 
> > > budget is finished but not vcpu has not reached the rate limit
> > > boundary ?
> > > 
> > 
> > Budget takes precedence over ratelimiting. This is important to
> > keep
> > cap working "regularly", rather then in some kind of permanent
> > "trying-
> > to-keep-up-with-overruns-in-previous-periods" state.
> > 
> > And, ideally, a vcpu cap and ratelimiting should be set in such a
> > way
> > that they don't step on each other toe (or do that only rarely). I
> > can
> > see about trying to print a warning when I detect potential tricky
> > values (but it's not easy, considering budget is per-domain, so I
> > can't
> > be sure about how much each vcpu will actually get, and whether or
> > not
> 
> why you can't be sure. Scheduler know the domain budget, number of
> vcpus 
> per domain and we can calculate the budget_quota and translate it
> into 
> cpu slot duration.
>
Sure. So, let's say you give a domain 200%, which means 200ms of budget
every 100ms. It has 4 vcpus, which means each vcpu will get 50ms.

At time t, vcpu1 starts running, executes for 10ms, and then stops.
Still at time t, all the other three vcpus (vcpu2, vcpu3 and vcpu4)
starts running; they run for 50ms, which means they exhaust the quota
you assigned to them, but they would like to continue to run?
What do you do?
There's still 40ms worth of budget available, for this period, in the
domain.

If you don't let (any of) them run, and use that budget, then you're
limiting the domain to 160%.

If you do let (maybe some of) them run, then they are using more than
the quota you calculated for each of them, which is fine, from the cap
point of view (and, in fact, it's what happens with this series), but
means that you can't assume to know for sure what quota of budget each
vcpu will actually use, and hence you can't...

> Similarly , the value of rate limit is also known. We can compare
> and 
> give a warning to the user if the budget_quota is less than rate
> limit.
> 
...compare that with the ratelimit value (or at least, you can sort of
guess and try to come up with a sensible warning, but you can't be
sure).

> This is very important for the user to know, if wrongly chosen, it
> can 
> adversely affect the system's performance with frequent context 
> switches. (the problem we are aware of).
> 
I know. I'll think at how to better prevent (or warn if seeing) too
small values, but there's no such thing as crystal balls or magic wands
:-(

> > > I checked the implenation below and I believe we can allow for
> > > these
> > > type of dynamic budget_quota allocation per vcpu. Not for initial
> > > version, but certainly we can consider it for future versions.
> > > 
> > 
> > But... it's already totally dynamic.
> 
> csched2_dom_cntl()
> {
> svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
>                                          CSCHED2_MIN_TIMER);
> }
> If domain->tot_budge = 200
> nr_cpus is 4, then each cpu gets 50%.
> How this is dynamic allocation ? We are not considering vcpu
> utilization 
> of other vcpus of domain before allocating budget_quota for some
> vcpu.
> 
Right. Well, what this means is that each vcpu will get budget in
chunks of tot_budget/nr_vcpus. But then, how much budget each vcpu will
actually be able to get and consume in each period, it's impossible to
know in advance, as it will depend on overall system load, and the
behavior of the various vcpus of the domain.

> > > In runq candidate we have a code base
> > > /*
> > >   * Return the current vcpu if it has executed for less than
> > > ratelimit.
> > >   * Adjuststment for the selected vcpu's credit and decision
> > >   * for how long it will run will be taken in csched2_runtime.
> > >   *
> > >   * Note that, if scurr is yielding, we don't let rate limiting
> > > kick
> > > in.
> > >   * In fact, it may be the case that scurr is about to spin, and
> > > there's
> > >   * no point forcing it to do so until rate limiting expires.
> > >   */
> > >   if ( !yield && prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu)
> > > &&
> > >        vcpu_runnable(scurr->vcpu) &&
> > >       (now - scurr->vcpu->runstate.state_entry_time) <
> > >         MICROSECS(prv->ratelimit_us) )
> > > In this codeblock we return scurr. Here there is no check for
> > > vcpu-
> > > > budget.
> > > 
> > > Even if the scurr vcpu has executed for less than rate limit and
> > > scurr
> > > is not yielding, we need to check for its budget before returning
> > > scurr.
> > > 
> > 
> > But we check vcpu_runnable(scurr). And we've already called, in
> > csched2_schedule(), vcpu_try_to_get_budget(scurr). And if scurr
> > could
> > not get any budget, we called park_vcpu(scurr), which sets scurr up
> > in
> > such a way that vcpu_runnable(scurr) is false.
> 
> Yes, got your point, but then the call for vcpu_try_to_get_budet
> should 
> move to the code block in runq_candidate that return scurr other
> wise 
> the condition looks incomplete and makes the logic ambiguous.
> 
I don't think so. I've used a new pause flag for parking vcpus
_exactly_ for taking advantage of the fact that vcpu_runnable() will
then do the right thing automatically, and I wouldn't have to spread
budget checks all around the code.

For instance, something similar happens in context_saved(). There it's
the opposite, i.e., if a vcpu had been parked, but a replenishment
arrived, clearing the _VPF_parked flag, then the vcpu_runnable() check
already present in context_save() will do the right thing and add the
vcpu back in the runqueue.

It's a distinctive characteristic of this implementation, as opposed,
for instance, to Credit1 one, which use vcpu_pause() and vcpu_unpause()
for the same purpose (which is something I totally dislike), and I
don't see why not take advantage of it.

> We call runq_candidate to find the next runnable candidate. If we
> want 
> to return scurr as the current runnable candidate then it should
> have 
> gone through all the checks including budget_quota and all these
> checks 
> should be at one place.
>
Exactly! And in fact, they all are exactly there, being taken care of
by vcpu_runnable() (in the same exact way as it takes care of checking
whether the vcpu has blocked on some I/O, or has been explicitly
paused, or ...).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.