[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.



On Wed, 2017-08-09 at 12:38 +0100, Tim Deegan wrote:
> Hi,
> 
Hey! :-)

> At 10:01 +0200 on 27 Jul (1501149684), Dario Faggioli wrote:
> > In Xen, that is impossible, and that's particularly problematic
> > when system is idle (or lightly loaded) systems, as CPUs that
> > are idle may never have the chance to tell RCU about their
> > quiescence, and grace periods could extend indefinitely!
> 
> [...]
> > The first step for fixing this situation is for RCU to record,
> > at the beginning of a grace period, which CPUs are already idle.
> > In fact, being idle, they can't be in the middle of any read-side
> > critical section, and we don't have to wait for them to declare
> > a grace period finished.
> 
> AIUI this patch fixes a bug where:
>  - a CPU is idle/asleep;
>  - it is added to the cpumask of a new RCU grace period; and
>  - because the CPU is asleep, the grace period never ends. 
> Have I understood?
> 
Yes indeed.

Basically, without this patch, all the CPUs are always added/considered
for all the grace periods. And if there are some that are idle/asleep,
we need to wait for them to wake up, before declared the grace period
ended.

So the duration of the grace period itself is out of any control (and
in fact, it happens to be reasonable, on x86, even on fully idle
system, while not so much on ARM). Even virtually infinite... :-O

> I think we might be left with a race condition:
>  - CPU A is about to idle.  It runs the RCU softirq and
>    clears itself from the current grace period.
>  - CPU B ends the grace period and starts a new one.
>  - CPU A calls rcu_idle_enter() and sleeps.
>  - The new grace period never ends.
> 
Yes, I think this is a thing.

I've give it some thoughts, and will post something about it as another
email.

> Is that fixed by your later rcu_idle_timer patch?  AIUI that's only
> invoked when the calling CPU has pending RCU callbacks.
> 
The timer is meant at dealing with the CPU with pending callback, yes.
I don't think we can solve this issue with the timer, even if we extend
its scope to all CPUs with whatever pending RCU processing... we'd
still have problems with grace periods that starts (on CPU B) after the
check for whether or not we need the timer has happened.

> Or can it be fixed here by something like this in rcu_idle_enter?
>  - lock rcp->lock
>  - add ourselves to the idle mask
>  - if we are in rcp->cpumask:
>      - cpu_quiet()
>  - unlock rcp->lock
> 
If rcu_idle_enter() leaves inside an IRQ disabled section (e.g., in
mwait_idle(), as it does right now), I can't take rcp->lock (because of
spinlock IRQ safety).

If I move it outside of that, then yes, the above may work. Or,
actually, we may not even need it... But as I said, more on this in
another message.

> There's also the code at the top of rcu_check_quiescent_state() that
> requres _two_ idle states per batch.  I don't know what race that's
> protecting against so I don't know whether we need to worry about it
> here as well. :)
> 
Not sure I'm getting this part. It's not a race, it is that, literally,
the first time (after someone has started a new batch) a CPU executes
rcu_check_quiescent_state(), realizes we're in a new grace period:

   6.681421066 -.-.-.|.--x-|--- d0v7 rcu_call fn=0xffff82d080207c4c, 
rcp_cur=-300, rdp_quiescbatch=-300
   6.681422601 -.-.-.|.--x-|--- d0v7 do_softirq, pending = 0x00000010
   6.681422889 -.-.-.|.--x-|--- d0v7 rcu_pending? yes (ret=2): no pending 
entries but new entries
   6.681423214 -.-.-.|.--x-|--- d0v7 raise_softirq nr 3
   6.681423494 -.-.-.|.--x-|--- d0v7 softirq_handler nr 3
   6.681423494 -.-.-.|.--x-|--- d0v7 rcu_process_callbacks, rcp_completed=-300, 
rdp_batch=0, rdp_curlist: null, rdp_nxtlist: yes, rdp_donelist: null
   6.681423494 -.-.-.|.--x-|--- d0v7 rcu_next_batch, rcp_cur=-300, 
rdp_batch=-299, rcp_next_pending? no
   6.681423494 -.-.-.|.--x-|--- d0v7 rcu_start_batch, rcp_cur=-299, 
rcp_cpumask=0x00001442
1->6.681423494 -.-.-.|.--x-|--- d0v7 rcu_check_quiesc_state, rcp_cur=-299, 
rdp_quiescbatch=-300, rdp_qs_pending: no
   6.681423494 -.-.-.|.--x-|--- d0v7 rcu_grace_start, rcp_cur=-299

The second realizes the grace period ended:

   6.681425277 -.-.-.|.--x-|--- d0v7 do_softirq, pending = 0x00000010
   6.681425647 -.-.-.|.--x-|--- d0v7 rcu_pending? yes (ret=5): waiting for CPU 
to quiesce (rdp_qs_pending=1)
   6.681425872 -.-.-.|.--x-|--- d0v7 raise_softirq nr 3
   6.681426117 -.-.-.|.--x-|--- d0v7 softirq_handler nr 3
   6.681426117 -.-.-.|.--x-|--- d0v7 rcu_process_callbacks, rcp_completed=-300, 
rdp_batch=-299, rdp_curlist: yes, rdp_nxtlist: null, rdp_donelist: null
2->6.681426117 -.-.-.|.--x-|--- d0v7 rcu_check_quiesc_state, rcp_cur=-299, 
rdp_quiescbatch=-299, rdp_qs_pending: yes
   6.681426117 -.-.-.|.--x-|--- d0v7 rcu_grace_done, rcp_cur=-299, 
rcp_completed=-300, rdp_quiescbatch=-299
   6.681426117 -.-.-.|.--x-|--- d0v7 rcu_cpu_quiet, rcp_cur=-299, 
rcp_cpumask=0x00001042

This trace above was for the CPU which actaully started the grace
period, and which has the callback queued. Here's the same for another
one:

   6.682213282 -.-.-.x.----|--- d0v2 vcpu_block d0v2
   6.682213549 -.-.-.x.----|--- d0v2 raise_softirq nr 1
   6.682213999 -.-.-.x.----|--- d0v2 do_softirq, pending = 0x00000002
   6.682214397 -.-.-.x.----|--- d0v2 rcu_pending? yes (ret=4): waiting for CPU 
to quiesce (rdp_qs_pending=0)
   6.682214667 -.-.-.x.----|--- d0v2 raise_softirq nr 3
   [...]
   6.682224404 -.-.-.x.----|--- d32767v6 softirq_handler nr 3
   6.682224404 -.-.-.x.----|--- d32767v6 rcu_process_callbacks, 
rcp_completed=-300, rdp_batch=0, rdp_curlist: null, rdp_nxtlist: null, 
rdp_donelist: null
1->6.682224404 -.-.-.x.----|--- d32767v6 rcu_check_quiesc_state, rcp_cur=-299, 
rdp_quiescbatch=-300, rdp_qs_pending: no
   6.682224404 -.-.-.x.----|--- d32767v6 rcu_grace_start, rcp_cur=-299
   6.682225310 -.-.-.x.----|--- d32767v6 do_softirq, pending = 0x00000000
   6.682225570 -.-.-.x.----|--- d32767v6 rcu_pending? yes (ret=5): waiting for 
CPU to quiesce (rdp_qs_pending=1)
   6.682225790 -.-.-.x.----|--- d32767v6 raise_softirq nr 3
   6.682226032 -.-.-.x.----|--- d32767v6 softirq_handler nr 3
   6.682226032 -.-.-.x.----|--- d32767v6 rcu_process_callbacks, 
rcp_completed=-300, rdp_batch=0, rdp_curlist: null, rdp_nxtlist: null, 
rdp_donelist: null
2->6.682226032 -.-.-.x.----|--- d32767v6 rcu_check_quiesc_state, rcp_cur=-299, 
rdp_quiescbatch=-299, rdp_qs_pending: yes
   6.682226032 -.-.-.x.----|--- d32767v6 rcu_grace_done, rcp_cur=-299, 
rcp_completed=-300, rdp_quiescbatch=-299
   6.682226032 -.-.-.x.----|--- d32767v6 rcu_cpu_quiet, rcp_cur=-299, 
rcp_cpumask=0x00000000

But maybe I've not understood properly what you meant...

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.