[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1.1 2/3] xen/sched: fix theoretical races accessing vcpu->dirty_cpu
On 04.05.20 14:48, Jan Beulich wrote: On 04.05.2020 14:41, Jürgen Groß wrote:On 04.05.20 13:51, Jan Beulich wrote:On 30.04.2020 17:28, Juergen Gross wrote:@@ -1956,13 +1958,17 @@ void sync_local_execstate(void) void sync_vcpu_execstate(struct vcpu *v) { - if ( v->dirty_cpu == smp_processor_id() ) + unsigned int dirty_cpu = read_atomic(&v->dirty_cpu); + + if ( dirty_cpu == smp_processor_id() ) sync_local_execstate(); - else if ( vcpu_cpu_dirty(v) ) + else if ( is_vcpu_dirty_cpu(dirty_cpu) ) { /* Remote CPU calls __sync_local_execstate() from flush IPI handler. */ - flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE); + flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE); } + ASSERT(read_atomic(&v->dirty_cpu) != dirty_cpu || + dirty_cpu == VCPU_CPU_CLEAN);!is_vcpu_dirty_cpu(dirty_cpu) again? Also perhaps flip both sides of the || (to have the cheaper one first), and maybe if ( is_vcpu_dirty_cpu(dirty_cpu) ) ASSERT(read_atomic(&v->dirty_cpu) != dirty_cpu); as the longer assertion string literal isn't really of that much extra value.I can do that, in case we can be sure the compiler will drop the test in case of a non-debug build.Modern gcc will afaik; no idea about clang though. I'll go with both tests inside the ASSERT(), as this will hurt debug build only. However, having stared at it for a while now - is this race free? I can see this being fine in the (initial) case of dirty_cpu == smp_processor_id(), but if this is for a foreign CPU, can't the vCPU have gone back to that same CPU again in the meantime?This should never happen. Either the vcpu in question is paused, or it has been forced off the cpu due to not being allowed to run there (e.g. affinity has been changed, or cpu is about to be removed from cpupool). I can add a comment explaining it.There is a time window from late in flush_mask() to the assertion you add. All sorts of things can happen during this window on other CPUs. IOW what guarantees the vCPU not getting unpaused or its affinity getting changed yet another time? Hmm, very unlikely, but not impossible (especially in nested virt case). This makes me wonder whether adding the call to sync_vcpu_execstate() to sched_unit_migrate_finish() was really a good idea. There might be cases where it is not necessary (e.g. in the case you are mentioning, but with a much larger time window), or where it is more expensive than doing it the lazy way. Dropping this call of sync_vcpu_execstate() will remove the race you are mentioning completely. Juergen
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |