Xen project Mailing List

Re: [PATCH v2 3/3] xen/sched: fix latent races accessing vcpu->dirty_cpu

Date: Thu, 14 May 2020 15:58:40 +0200

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 14 May 2020 13:58:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.05.2020 11:29, Jürgen Groß wrote: > On 14.05.20 11:24, Jan Beulich wrote: >> On 14.05.2020 10:50, Jürgen Groß wrote: >>> On 14.05.20 09:10, Jan Beulich wrote: >>>> On 11.05.2020 13:28, Juergen Gross wrote: >>>>> @@ -1956,13 +1958,17 @@ void sync_local_execstate(void) >>>>> void sync_vcpu_execstate(struct vcpu *v) >>>>> { >>>>> - if ( v->dirty_cpu == smp_processor_id() ) >>>>> + unsigned int dirty_cpu = read_atomic(&v->dirty_cpu); >>>>> + >>>>> + if ( dirty_cpu == smp_processor_id() ) >>>>> sync_local_execstate(); >>>>> - else if ( vcpu_cpu_dirty(v) ) >>>>> + else if ( is_vcpu_dirty_cpu(dirty_cpu) ) >>>>> { >>>>> /* Remote CPU calls __sync_local_execstate() from flush IPI >>>>> handler. */ >>>>> - flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE); >>>>> + flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE); >>>>> } >>>>> + ASSERT(!is_vcpu_dirty_cpu(dirty_cpu) || >>>>> + read_atomic(&v->dirty_cpu) != dirty_cpu); >>>> >>>> Repeating my v1.1 comments: >>>> >>>> "However, having stared at it for a while now - is this race >>>> free? I can see this being fine in the (initial) case of >>>> dirty_cpu == smp_processor_id(), but if this is for a foreign >>>> CPU, can't the vCPU have gone back to that same CPU again in >>>> the meantime?" >>>> >>>> and later >>>> >>>> "There is a time window from late in flush_mask() to the assertion >>>> you add. All sorts of things can happen during this window on >>>> other CPUs. IOW what guarantees the vCPU not getting unpaused or >>>> its affinity getting changed yet another time?" >>>> >>>> You did reply that by what is now patch 2 this race can be >>>> eliminated, but I have to admit I don't see why this would be. >>>> Hence at the very least I'd expect justification in either the >>>> description or a code comment as to why there's no race left >>>> (and also no race to be expected to be re-introduced by code >>>> changes elsewhere - very unlikely races are, by their nature, >>>> unlikely to be hit during code development and the associated >>>> testing, hence I'd like there to be sufficiently close to a >>>> guarantee here). >>>> >>>> My reservations here may in part be due to not following the >>>> reasoning for patch 2, which therefore I'll have to rely on the >>>> scheduler maintainers to judge on. >>> >>> sync_vcpu_execstate() isn't called for a running or runnable vcpu any >>> longer. I can add an ASSERT() and a comment explaining it if you like >>> that better. >> >> This would help (hopefully people adding new uses of the function >> would run into this assertion/comment), but for example the uses >> in mapcache_current_vcpu() or do_tasklet_work() look to be pretty >> hard to prove they can't happen for a runnable vCPU. > > Those call sync_local_execstate(), not sync_vcpu_execstate(). Ouch, as said on the other sub-thread - I'm sorry for mixing those up. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.