[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] xen: sched: convert RTDS from time to event driven model





On 2/5/2016 9:39 AM, Dario Faggioli wrote:
On Wed, 2016-02-03 at 21:23 -0500, Tianyang Chen wrote:

On 2/3/2016 7:39 AM, Dario Faggioli wrote:
On Tue, 2016-02-02 at 13:09 -0500, Tianyang Chen wrote:

So what do we do, we raise the *_delayed_runq_add flag and continue
and
leave the vcpu alone. This happens, for instance, when there is a
race
between scheduling out a vcpu that is going to sleep, and the vcpu
in
question being woken back up already! It's not frequent, but we
need to
take care of that. At some point (sooner rather tan later, most
likely)
the context switch will actually happen, and the vcpu that is
waking up
is put in the runqueue.


Yeah I agree with this situation. But I meant to say when a vcpu is
waking up while it's still on a queue. See below.

So, if a vcpu wakes up while it is running on a pcpu already, or while
it is already woken and has already been put in the runqueue, these are
SPURIOUS WAKEUP, which should be dealt with by just doing nothing, as
every scheduler is doing.... This is what I'm tring to say since a
couple of email, let's see if I've said it clear enough this time. :-D


I see. So I'm just curious what can cause spurious wakeup? Does it only happen to a running vcpu (currently on pcpu), and a vcpu that is on either runq or depletedq?

static void
rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
{
      struct rt_vcpu * const svc = rt_vcpu(vc);
      s_time_t now = NOW();
      struct rt_private *prv = rt_priv(ops);

      BUG_ON( is_idle_vcpu(vc) );

      if ( unlikely(curr_on_cpu(vc->processor) == vc) )
      {
          SCHED_STAT_CRANK(vcpu_wake_running);
          return;
      }


For this situation, a vcpu is waking up on a pcpu. It could be taken
off from the replq inside the timer handler at the end. So, if we
decide
to not replenish any sleeping/un-runnable vcpus any further, we
should
probably add it back to replq here right?

I'm sorry, I don't understand what you're saying here. The vcpu woke up
a few time ago. We went through here. At the time it was not on any
pcpu and it was not in the runqueue, so we did plan a replenishment
event for it in the replenishment queue, at it's next deadline.

Now we're here again, for some reason. We figure out that the vcpu is
running already and, most likely (feel free to add an ASSERT() inside
the if to that effect), it will also still have its replenishment event
queued.

So we realized that this is just a spurious wakeup, and get back to
what we were doing.

What's wrong with this I just said?


The reason why I wanted to add a vcpu back to replq is because it could be taken off from the timer handler. Now, because of the spurious wakeup, I think the timer shouldn't take any vcpus off of replq, in which case wake() should be doing nothing just like other schedulers when it's a spurious wakeup. Also, only sleep() should remove events from replq.

      /* on RunQ/DepletedQ, just update info is ok */
      if ( unlikely(__vcpu_on_q(svc)) )
      {
          SCHED_STAT_CRANK(vcpu_wake_onrunq);
          return;
      }


For this situation, a vcpu could be going through the following
phases:

on runq --> sleep --> (for a long time) --> wake up

Well, but then we're not here, because when it went to sleep, it's been
taken off the runqueue, hasn't it?

Actually, I do think that you need to be running to go to sleep, but
anyway, even if a vcpu could go to sleep or block from the runqueue,
it's not going to stay in the runqueue, and the first wakeup that
occurs does not find it on the runqueue, and hence is not covered by
this case... Is it?


uh, I missed the __q_remove() from sleep. It will not end up here in the case I mentioned.

When it wakes up, it could stay in the front of the runq because
cur_deadline hasn't been updated. It will, inherently have some high
priority-ness (because of EDF) and be picked over other vcpus right?
Do
we want to update it's deadline and budget here and re-insert it
accordingly?

When it wakes up and it is found not to be either running or in the
runqueue already, yes, we certainly want to do that!

But if some other (again, spurious) wakeup occurs, and find it already
in the runqueue (i.e., this case) we actually *don't* want to update
its deadline again.


Agree.

      if ( likely(vcpu_runnable(vc)) )
          SCHED_STAT_CRANK(vcpu_wake_runnable);
      else
          SCHED_STAT_CRANK(vcpu_wake_not_runnable);


      if ( now >= svc->cur_deadline)
      {
          rt_update_deadline(now, svc);
          __replq_remove(svc);
      }

      __replq_insert(svc);


For here, consider this scenario:

on pcpu (originally on replq) --> sleep --> context_saved() but
still
asleep --> wake()

Timer handler isn't triggered before it awakes. Then this vcpu will
be
added to replq again while it's already there. I'm not 100% positive
if
this situation is possible though. But judging from
rt_context_saved(),
it checks if a vcpu is runnable before adding back to the runq.

I'm also having a bit of an hard time understanding this... especially
what context_saved() has to do with it.


I guess I shouldn't have put /* */ around the removal of a vcpu on replq in sleep(). So the original patch won't actually take a vcpu off if it goes to sleep. Like Meng said, it should. And I see what you mean by spurious wakeup now. I will put a summary for wake() somewhere below.

Are you saying that it is possible for a vcpu to start running with
deadline td, and hence a replenishment is programmed to happen at td,
and then to go to sleep at ta<td and wakeup at tb that is also <td?

Well, in that case, yes, whether to update the deadline or not, and
whether to reuse the already existing replenishment or not, it's up to
you guys, as it's part of the algorithm. What does the algorithm say in
this case?

In this implementation, you are removing replenishment when a vcpu
sleep, so it looks to me that you don't have much choice than re-
instating it at wakeup (we've gone through this already, remember?).

OTOH, if you wouldn't stop the timer on sleep, then yes, we ma need
some logic here, to understand whether the replenishment happened
during sleeping/blocking time, or if it's still pending.

If that was not what you were saying... then, sorry, but I just did not
get it...

Thoughts?


So just to wrap up my thoughts for budget replenishment inside of
wake(), a vcpu can be in different states(on pcpu, on queue etc.)
when
it wakes up.

Yes, and only one is usually interesting: when it's not in any runqueue
nor on any pcpu. In out case, the case where it's in _delayed_runq add
needs some care as well (unlike, e.g., in Credit2).


See below.

1)wake up on a pcpu:
Do nothing as it may have some remaining budget and we just ignore
it's
cur_deadline when it wakes up. I can't find a corresponding pattern
in
DS but this can be treated like a server being pushed back because
of
the nap.

Well, I'd say: do nothing because it's a spurious wakeup.


Yes.

2)wake up on a queue:
Update the deadline(refill budget) of it and re-insert it into runq
for
the reason stated above.

Do nothing because it's a spurious wakeup.


Yes.

3)wake up in the middle of context switching:
Update the deadline(refill budget) of it so it can be re-inserted
into
runq in rt_context_saved().

Ok.

4)wake up on nowhere ( sleep on pcpu--> context_saved() --> wake()):
Replenish it and re-insert it back to runq.

Here it is the one we care about. And yes, I agree on what we should do
in this case.

As for replenishment queue manipulation, we should always try and add
it
back to replq because we don't know if the timer handler has taken
it
off or not.

Mmm... no, I think we should know/figure out whether a replenishment is
pending already and we are ok with it, or if we need a new one.


Just to make sure, spurious sleep/wakeup are for a vcpu that is on pcpu or any queue right?


If a real sleep happens, it should be taken off from replq. However, in spurious wakeup (which I assume corresponds to a spurious sleep?), it shouldn't be taken off from replq. Adding back to replq should happen for those vcpus which were taken off because of "real sleep".

So here is a summary to make sure everyone's on the same page :)
This is basically your code with a slight modification before replq_insert().

static void
rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
{
    struct rt_vcpu * const svc = rt_vcpu(vc);
    s_time_t now = NOW();

    BUG_ON( is_idle_vcpu(vc) );

    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
    {
        /*
         * spurious wakeup so do nothing as it is
         * not removed from replq
         */
        SCHED_STAT_CRANK(vcpu_wake_running);
        return;
    }

    /* on RunQ/DepletedQ, just update info is ok */
    if ( unlikely(__vcpu_on_q(svc)) )
    {
        /*
         * spurious wakeup so do nothing as it is
         * not removed from replq
         */
        SCHED_STAT_CRANK(vcpu_wake_onrunq);
        return;
    }

    if ( likely(vcpu_runnable(vc)) )
        SCHED_STAT_CRANK(vcpu_wake_runnable);
    else
        SCHED_STAT_CRANK(vcpu_wake_not_runnable);


    /*
     * sleep() might have taken it off from replq before
     * conext_saved(). Three cases:
     * 1) sleep on pcpu--> context_saved() --> wake()
     * 2) sleep on pcpu--> wake() --> context_saved()
     * 3) sleep before context_saved() -->wake().
     * The first two cases sleep() doesn't remove this vcpu
     * so add a check before inserting.
     */
    if ( now >= svc->cur_deadline)
    {
        rt_update_deadline(now, svc);
        __replq_remove(svc);
    }

    if( !__vcpu_on_replq(svc) )
        __replq_insert(svc);

    /* If context hasn't been saved for this vcpu yet, we can't put it on
     * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
     * put on the Runqueue/DepletedQ after the context has been saved.
     */
    if ( unlikely(svc->flags & RTDS_scheduled) )
    {
        set_bit(__RTDS_delayed_runq_add, &svc->flags);

        return;
    }

    /* insert svc to runq/depletedq because svc is not in queue now */
    __runq_insert(ops, svc);
    runq_tickle(ops, snext);

    return;
}


static void
rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
{
    struct rt_vcpu * const svc = rt_vcpu(vc);

    BUG_ON( is_idle_vcpu(vc) );
    SCHED_STAT_CRANK(vcpu_sleep);

    /*
     * If a vcpu is not sleeping on a pcpu or a queue,
     * it should be taken off
     * from the replenishment queue
     * Case 1: sleep on a pcpu
     * Case 2: sleep on a queue
     * Case 3: sleep before context switch is finished
     */
    if ( curr_on_cpu(vc->processor) == vc )
        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
    else if ( __vcpu_on_q(svc) )
        __q_remove(svc);
    else if ( svc->flags & RTDS_delayed_runq_add )
    {
        clear_bit(__RTDS_delayed_runq_add, &svc->flags);

        if( __vcpu_on_replq(svc) )
           __replq_remove(svc);
    }
}



Oh I might be misunderstanding what should be put in
rt_schedule().
Was
it specifically for handling a vcpu that shouldn't be taken off a
core?
Why are you referring this piece of code as budget enforcement?

Isn't it necessary to modify the runq here after budget
replenishment
has happened? If runq goes out of order here, will rt_schedule()
be
sorting it?

Ops, yes, you are right, this is not budget enforcement. So, budget
enforcement looks still to be missing from this patch (and that is
why
I asked whether it is actually working).

And this chunk of code (and perhaps this whole function) is
probably
ok, it's just a bit hard to understand what it does...


So, the burn_budget() is still there right? I'm not sure what you
are
referring to as budget enforcement. Correct me if I'm wrong, budget
enforcement means that each vcpu uses its assigned/remaining budget.
Is
there any specific situation I'm missing?

That's correct, and I saw that burn_budget() is still there. What I
failed to see was logic, within rt_schedule() for stopping a vcpu that
has exhausted its budget until its next replenishment.

Before, this was done in __repl_update(), which is (and that's a good
thing) no longer there.

But maybe it's me that missed something...


__repl_update() only scans runq and depletedq for replenishment right? And right after burn_budget(), if the current vcpu runs out of budget,
snext will be picked here because of:

/* if scurr has higher priority and budget, still pick scurr */
if ( !is_idle_vcpu(current) && vcpu_runnable(current) &&
      scurr->cur_budget > 0 && ( is_idle_vcpu(snext->vcpu) ||
    scurr->cur_deadline <= snext->cur_deadline ) )
    snext = scurr;

if scurr->cur_budget == 0, snext will be picked immediately.
Is that what you looking for?

So, do you mind if we take a step back again, and analyze the
situation. When the timer fires, and this handler is executed and a
replenishment event taken off the replenishment queue, the
following
situations are possible:

   1) the replenishment is for a running vcpu
   2) the replenishment is for a runnable vcpu in the runq
   3) the replenishment is for a runnable vcpu in the depletedq
   4) the replenishment is for a sleeping vcpu

So, 4) is never going to happen, right?, as we're stopping the
timer
when a vcpu blocks. If we go for this, let's put an ASSERT() and a
comment as a guard for that. If we do the optimization in this very
patch. If we leave it for another patch, we need to handle this
case, I
guess.

In all 1), 2) and 3) cases, we need to remove the replenishment
event
from the replenishment queue, so just (in the code) do it upfront.


I don't think we need to take them of the replenishment queue? See
the
last couple paragraphs.

I meant removing it for reinserting it again with a different
replenishment time, due to the fact that, as a consequence of the
replenishment that is just happening, the vcpu is being given a new
deadline.


Ok so we are talking about the same thing.

What other actions need to happen in each case, 1), 2) and 3)? Can
you
summarize then in here, so we can decide how to handle each
situation?


Here we go.
1) the replenishment is for a running vcpu
Replenish the vcpu. Keep it on the replenishment queue. Tickle it as
it
may have a later deadline.

What do you mean keep it in the repl queue? This particular
replenishment just happened, which means the time at which it was
scheduler passed, there is no reason for keeping it in the queue like
this. Do you mean inserting a new replenishment event in the queue, for
this same vcpu, with a new replenishment time corresponding to its
deadline? If yes, then yes, I agree with that.


Exactly. It's the way I expressed it. When I said keep it on, it includes remove() + insert(). lol

2) the replenishment is for a runnable vcpu in the runq
Replenish the vcpu. Keep it on the replenishment queue.

Same as above, I guess?


Yes.

(re-)insert it
into runq. Tickle it.

Ok.

3) the replenishment is for a runnable vcpu in the depletedq
Same as 2)

Same as 2), but you insert it in the runq (not again in the depletedq),
right?


Right. _q_insert() checks if a vcpu has budget. If so it's put on the runq. Otherwise on the depletedq.

4) the replenishment is for a sleeping vcpu
Take it off from the replenishment queue and don't do replenishment.
As
a matter of fact, this patch does the replenishment first and then
take
it off.

Ah, so can this happen? Aren't you removing the queued replenishment
event when a vcpu goes to sleep?


Now, if we all agree on removing a vcpu in sleep(), this is not going to happen. So, there are three functions manipulating the replq now.

I'd say that, in both case 2) and case 3), the vcpu needs to be
taken
out of the queue where they are, and (re-)inserted in the runqueue,
and
then we need to tickle... Is that correct?


correct^.

Ok. :-)

I'd also say that, in case 1), we also need to tickle because the
deadline may now be have become bigger than some other vcpu that is
in
the runqueue?


correct. I missed this case^.

Ok. :-) :-)

Also, under what conditions we need, as soon as replenishment for
vcpu
X happened, to queue another replenishment for the same vcpu, at
its
next deadline? Always, I would say (unless the vcpu is sleeping,
because of the optimization you introduced)?

So this is kinda confusing. If in case 1) 2) and 3) they are never
taken
off from the replenishment queue, a series of replenishment events
are
automatically updated after the replenishment.

This is mostly about the both of us saying the same thing differently.

I'm saying that you should remove the vcpu from the replenishment runqueue and 
then insert it in there again with an updated replenishment time, computed out 
of the vcpu's new deadline.

You are saying that the replenishment events are updated.

Are we (talking about the same thing)? :-)


Yeah.

  There is no need, as
least one less reason, to keep track of when to queue the next vcpus
for
replenishment.

I don't understand what you mean here.

The side effect of this is that all vcpus could be going to sleep
right
after the timer was programmed at the end of the handler. If they
don't
wake up before the timer fires next time, the handler is called to
replenish nobody. If that's the case, timer will not be programmed
again
in the handler. Wake() kicks in and programs it instead. And the
downside of this is that we always need to check if an awaking vcpu
is
on replq or not. It not add it back(hence the notion of starting to
track it's replenishment events again). When adding back, check if
there
is a need to reprogram the timer.

This all sounds fine to me.

Great. I will put together v5 soon. If we all agree on the rt_wake() and rt_sleep().

Thanks,
Tianyang








_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.