Xen project Mailing List

Re: [Xen-devel] [PATCH v2 0/4] xen/rcu: let rcu work better with core scheduling

To: Jürgen Groß <jgross@xxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>

Date: Mon, 2 Mar 2020 14:23:24 +0000

Authentication-results: esa2.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=igor.druzhinin@xxxxxxxxxx; spf=Pass smtp.mailfrom=igor.druzhinin@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Mon, 02 Mar 2020 14:23:51 +0000

Ironport-sdr: 4CIkLfW520KEAs75K1BhLtigMJpUVB7fsk3kS6m/C+0benbOMmpOJYJVS5HcKirrMw0dRIhdhX qMRs9XCT2o6QcpdzraPSieI8DzO/0TAVRnqmvO3Ksc80MJnDdkqinTXK4gj+2Own++T5FMTQAD y3/GlGL8/C6s5VoXrbBhmE+lNyQgJuzYlShbYHZYKhXnyqHzr8XL/0s0RqueKXvhGke5ociqWv kQ1Use29wEOxCviq49lW4uE5RbNAjYLU/Fe/pE4NN2HT/BmJOsy1qjuAtAvEbF+KFXFM4cP7ZZ 7jM=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02/03/2020 14:03, Jürgen Groß wrote: > On 02.03.20 14:25, Igor Druzhinin wrote: >> On 28/02/2020 07:10, Jürgen Groß wrote: >>> >>> I think you are just narrowing the window of the race: >>> >>> It is still possible to have two cpus entering rcu_barrier() and to >>> make it into the if ( !initial ) clause. >>> >>> Instead of introducing another atomic I believe the following patch >>> instead of yours should do it: >>> >>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c >>> index e6add0b120..0d5469a326 100644 >>> --- a/xen/common/rcupdate.c >>> +++ b/xen/common/rcupdate.c >>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void) >>> >>> void rcu_barrier(void) >>> { >>> - int initial = atomic_read(&cpu_count); >>> - >>> while ( !get_cpu_maps() ) >>> { >>> process_pending_softirqs(); >>> - if ( initial && !atomic_read(&cpu_count) ) >>> + if ( !atomic_read(&cpu_count) ) >>> return; >>> >>> cpu_relax(); >>> - initial = atomic_read(&cpu_count); >>> } >>> >>> - if ( !initial ) >>> - { >>> - atomic_set(&cpu_count, num_online_cpus()); >>> + if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 ) >>> cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); >>> - } >>> >>> while ( atomic_read(&cpu_count) ) >>> { >>> >>> Could you give that a try, please? >> >> With this patch I cannot disable SMT at all. >> >> The problem that my diff solved was a race between 2 consecutive >> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off >> operation) where some CPUs didn't exit the cpu_count checking loop >> completely but cpu_count is already reinitialized on CPU0 - this >> results in some CPUs being stuck in the loop. > > Ah, okay, then I believe a combination of the two patches is needed. > > Something like the attached version? I apologies - my previous test result was from machine booted in core mode. I'm now testing it properly and the original patch seems to do the trick but I still don't understand how you can avoid the race with only 1 counter - it's always possible that CPU1 is still in cpu_count checking loop (even if cpu_count is currently 0) when cpu_count is reinitialized. I'm looking at your current version now. Was the removal of get_cpu_maps() and recursion protection intentional? I suspect it would only work on the latest master so I need to keep those for 4.13 testing. Igor _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.