[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 3/3] xen/sched: fix latent races accessing vcpu->dirty_cpu
On 14.05.2020 10:50, Jürgen Groß wrote: > On 14.05.20 09:10, Jan Beulich wrote: >> On 11.05.2020 13:28, Juergen Gross wrote: >>> @@ -1956,13 +1958,17 @@ void sync_local_execstate(void) >>> void sync_vcpu_execstate(struct vcpu *v) >>> { >>> - if ( v->dirty_cpu == smp_processor_id() ) >>> + unsigned int dirty_cpu = read_atomic(&v->dirty_cpu); >>> + >>> + if ( dirty_cpu == smp_processor_id() ) >>> sync_local_execstate(); >>> - else if ( vcpu_cpu_dirty(v) ) >>> + else if ( is_vcpu_dirty_cpu(dirty_cpu) ) >>> { >>> /* Remote CPU calls __sync_local_execstate() from flush IPI >>> handler. */ >>> - flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE); >>> + flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE); >>> } >>> + ASSERT(!is_vcpu_dirty_cpu(dirty_cpu) || >>> + read_atomic(&v->dirty_cpu) != dirty_cpu); >> >> Repeating my v1.1 comments: >> >> "However, having stared at it for a while now - is this race >> free? I can see this being fine in the (initial) case of >> dirty_cpu == smp_processor_id(), but if this is for a foreign >> CPU, can't the vCPU have gone back to that same CPU again in >> the meantime?" >> >> and later >> >> "There is a time window from late in flush_mask() to the assertion >> you add. All sorts of things can happen during this window on >> other CPUs. IOW what guarantees the vCPU not getting unpaused or >> its affinity getting changed yet another time?" >> >> You did reply that by what is now patch 2 this race can be >> eliminated, but I have to admit I don't see why this would be. >> Hence at the very least I'd expect justification in either the >> description or a code comment as to why there's no race left >> (and also no race to be expected to be re-introduced by code >> changes elsewhere - very unlikely races are, by their nature, >> unlikely to be hit during code development and the associated >> testing, hence I'd like there to be sufficiently close to a >> guarantee here). >> >> My reservations here may in part be due to not following the >> reasoning for patch 2, which therefore I'll have to rely on the >> scheduler maintainers to judge on. > > sync_vcpu_execstate() isn't called for a running or runnable vcpu any > longer. I can add an ASSERT() and a comment explaining it if you like > that better. This would help (hopefully people adding new uses of the function would run into this assertion/comment), but for example the uses in mapcache_current_vcpu() or do_tasklet_work() look to be pretty hard to prove they can't happen for a runnable vCPU. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |