[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/5] xen: RCU: avoid busy waiting until the end of grace period.



On Tue, 2017-08-01 at 09:54 +0100, Julien Grall wrote:
> Hi Dario,
> 
> On 27/07/2017 09:01, Dario Faggioli wrote:
> > Instead of having the CPU where a callback is queued, busy
> > looping on rcu_pending(), use a timer.
> > 
> > In fact, we let the CPU go idla,e but we program a timer
> > that will periodically wake it up, for checking whether the
> > grace period has actually ended.
> > 
> > It is kind of similar to introducing a periodic tick, but
> > with a much more limited scope, and a lot less overhead. In
> > fact, this timer is:
> > - only active for the CPU(s) that have callbacks queued,
> >   waiting for the end of a grace period;
> > - only active when those CPU(s) are idle (and stopped as
> >   soon as they resume execution).
> 
> If I read this correctly, it means on ARM the idling will now get 
> interrupted periodically. This is a bit unfortunate, given that if
> you 
> have a CPU doing nothing, you would still interrupt it
> intermittently.
> 
Not really periodically, not always, at least. What this really means
is that a CPU that is idle, *but* have pending RCU callbacks, will be
interrupted periodically to see if the grace period ended, so it can
invoke the callbacks.

As soon as this (callbacks being invoked) will have happened, we won't
interrupt it any longer.

And idle CPUs _without_ queued RCU callbacks, won't be interrupted at
all.

> I was expected that we could remove the CPU from the RCU whilst it
> is 
> idle. Is there any reason for not doing that?
> 
I'm not sure I understand what you mean here. I tried to explain as
good as I could how this works, and why I think it can't work in other
ways, in this reply to Stefano: <1501548445.30551.5.camel@xxxxxxxxxx>

Every CPU that participates in the grace period, and has already
quiesced, is "removed from RCU", and hence, when it becomes idle, it is
never interrupted (by this timer). With the only exception of the
CPU(s) that has queued callbacks.

We simply can't forget about these CPUs, even if they go idle. If we
do, the callbacks won't be invoked never (or will only be invoked when
the CPU becomes active again, which may happen really really late,
which is part of the reason why we're seeing the bug we're seeing).

Linux does this, in the *exact* same way (well, actually, in a totally
different way, from an implementation point of view, but the effect is
indeed exactly the same):

See here:

http://elixir.free-electrons.com/linux/v2.6.21/source/kernel/time/tick-sched.c#L198

We're in tick_nohz_stop_sched_tick(), i.e., the CPU is going idle, and
the periodic timer tick is being stopped (no interruptions). But

        if (rcu_needs_cpu(cpu))
                delta_jiffies = 1;

Where, rcu_needs_cpu() means:

/*
 * Check to see if any future RCU-related work will need to be done
 * by the current CPU, even if none need be done immediately, returning
 * 1 if so.
 */
int rcu_needs_cpu(int cpu)

Then, again in tick_nohz_stop_sched_tick()

        /*
         * Do not stop the tick, if we are only one off
         * or if the cpu is required for rcu
         */
        if (!ts->tick_stopped && delta_jiffies == 1)
                goto out;

And in fact, in my testing, without patches 3 and 5 applied, the bug is
still there.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.