[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 3/3] xen/sched: fix latent races accessing vcpu->dirty_cpu
On 14.05.20 09:10, Jan Beulich wrote: On 11.05.2020 13:28, Juergen Gross wrote:@@ -1956,13 +1958,17 @@ void sync_local_execstate(void)void sync_vcpu_execstate(struct vcpu *v){ - if ( v->dirty_cpu == smp_processor_id() ) + unsigned int dirty_cpu = read_atomic(&v->dirty_cpu); + + if ( dirty_cpu == smp_processor_id() ) sync_local_execstate(); - else if ( vcpu_cpu_dirty(v) ) + else if ( is_vcpu_dirty_cpu(dirty_cpu) ) { /* Remote CPU calls __sync_local_execstate() from flush IPI handler. */ - flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE); + flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE); } + ASSERT(!is_vcpu_dirty_cpu(dirty_cpu) || + read_atomic(&v->dirty_cpu) != dirty_cpu);Repeating my v1.1 comments: "However, having stared at it for a while now - is this race free? I can see this being fine in the (initial) case of dirty_cpu == smp_processor_id(), but if this is for a foreign CPU, can't the vCPU have gone back to that same CPU again in the meantime?" and later "There is a time window from late in flush_mask() to the assertion you add. All sorts of things can happen during this window on other CPUs. IOW what guarantees the vCPU not getting unpaused or its affinity getting changed yet another time?" You did reply that by what is now patch 2 this race can be eliminated, but I have to admit I don't see why this would be. Hence at the very least I'd expect justification in either the description or a code comment as to why there's no race left (and also no race to be expected to be re-introduced by code changes elsewhere - very unlikely races are, by their nature, unlikely to be hit during code development and the associated testing, hence I'd like there to be sufficiently close to a guarantee here). My reservations here may in part be due to not following the reasoning for patch 2, which therefore I'll have to rely on the scheduler maintainers to judge on. sync_vcpu_execstate() isn't called for a running or runnable vcpu any longer. I can add an ASSERT() and a comment explaining it if you like that better. --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -158,7 +158,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)v->domain = d;v->vcpu_id = vcpu_id; - v->dirty_cpu = VCPU_CPU_CLEAN; + write_atomic(&v->dirty_cpu, VCPU_CPU_CLEAN);This is not strictly necessary (the vCPU won't be acted upon by any other entity in the system just yet), and with this I'd like to suggest to drop this change again. Fine with me. Juergen
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |