[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/4] xen/rcu: let rcu work better with core scheduling



On 02/03/2020 14:03, Jürgen Groß wrote:
> On 02.03.20 14:25, Igor Druzhinin wrote:
>> On 28/02/2020 07:10, Jürgen Groß wrote:
>>>
>>> I think you are just narrowing the window of the race:
>>>
>>> It is still possible to have two cpus entering rcu_barrier() and to
>>> make it into the if ( !initial ) clause.
>>>
>>> Instead of introducing another atomic I believe the following patch
>>> instead of yours should do it:
>>>
>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>>> index e6add0b120..0d5469a326 100644
>>> --- a/xen/common/rcupdate.c
>>> +++ b/xen/common/rcupdate.c
>>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void)
>>>
>>>   void rcu_barrier(void)
>>>   {
>>> -    int initial = atomic_read(&cpu_count);
>>> -
>>>       while ( !get_cpu_maps() )
>>>       {
>>>           process_pending_softirqs();
>>> -        if ( initial && !atomic_read(&cpu_count) )
>>> +        if ( !atomic_read(&cpu_count) )
>>>               return;
>>>
>>>           cpu_relax();
>>> -        initial = atomic_read(&cpu_count);
>>>       }
>>>
>>> -    if ( !initial )
>>> -    {
>>> -        atomic_set(&cpu_count, num_online_cpus());
>>> +    if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 )
>>>           cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
>>> -    }
>>>
>>>       while ( atomic_read(&cpu_count) )
>>>       {
>>>
>>> Could you give that a try, please?
>>
>> With this patch I cannot disable SMT at all.
>>
>> The problem that my diff solved was a race between 2 consecutive
>> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off
>> operation) where some CPUs didn't exit the cpu_count checking loop
>> completely but cpu_count is already reinitialized on CPU0 - this
>> results in some CPUs being stuck in the loop.
> 
> Ah, okay, then I believe a combination of the two patches is needed.
> 
> Something like the attached version?

I apologies - my previous test result was from machine booted in core mode.
I'm now testing it properly and the original patch seems to do the trick but
I still don't understand how you can avoid the race with only 1 counter - 
it's always possible that CPU1 is still in cpu_count checking loop (even if
cpu_count is currently 0) when cpu_count is reinitialized.

I'm looking at your current version now. Was the removal of get_cpu_maps()
and recursion protection intentional? I suspect it would only work on the
latest master so I need to keep those for 4.13 testing.

Igor


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.