[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1.1 2/3] xen/sched: fix theoretical races accessing vcpu->dirty_cpu

On 04.05.20 13:51, Jan Beulich wrote:
On 30.04.2020 17:28, Juergen Gross wrote:
The dirty_cpu field of struct vcpu denotes which cpu still holds data
of a vcpu. All accesses to this field should be atomic in case the
vcpu could just be running, as it is accessed without any lock held
in most cases.

There are some instances where accesses are not atomically done, and
even worse where multiple accesses are done when a single one would
be mandated.

Correct that in order to avoid potential problems.

Beyond the changes you're making, what about the assignment in
startup_cpu_idle_loop()? And while less important, dump_domains()
also has a use that I think would better be converted for

startup_cpu_idle_loop() is not important, as it is set before any
scheduling activity might occur on a cpu. But I can change both

@@ -1844,6 +1845,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
          /* Remote CPU calls __sync_local_execstate() from flush IPI handler. 
          flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
+        ASSERT(read_atomic(&next->dirty_cpu) == VCPU_CPU_CLEAN);

ASSERT(!is_vcpu_dirty_cpu(read_atomic(&next->dirty_cpu))) ?

Yes, this is better.

@@ -1956,13 +1958,17 @@ void sync_local_execstate(void)
void sync_vcpu_execstate(struct vcpu *v)
-    if ( v->dirty_cpu == smp_processor_id() )
+    unsigned int dirty_cpu = read_atomic(&v->dirty_cpu);
+    if ( dirty_cpu == smp_processor_id() )
-    else if ( vcpu_cpu_dirty(v) )
+    else if ( is_vcpu_dirty_cpu(dirty_cpu) )
          /* Remote CPU calls __sync_local_execstate() from flush IPI handler. 
-        flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE);
+        flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
+    ASSERT(read_atomic(&v->dirty_cpu) != dirty_cpu ||
+           dirty_cpu == VCPU_CPU_CLEAN);

!is_vcpu_dirty_cpu(dirty_cpu) again? Also perhaps flip both
sides of the || (to have the cheaper one first), and maybe

     if ( is_vcpu_dirty_cpu(dirty_cpu) )
         ASSERT(read_atomic(&v->dirty_cpu) != dirty_cpu);

as the longer assertion string literal isn't really of that
much extra value.

I can do that, in case we can be sure the compiler will drop the
test in case of a non-debug build.

However, having stared at it for a while now - is this race
free? I can see this being fine in the (initial) case of
dirty_cpu == smp_processor_id(), but if this is for a foreign
CPU, can't the vCPU have gone back to that same CPU again in
the meantime?

This should never happen. Either the vcpu in question is paused,
or it has been forced off the cpu due to not being allowed to run
there (e.g. affinity has been changed, or cpu is about to be
removed from cpupool). I can add a comment explaining it.

--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -844,7 +844,7 @@ static inline bool is_vcpu_dirty_cpu(unsigned int cpu)
static inline bool vcpu_cpu_dirty(const struct vcpu *v)
-    return is_vcpu_dirty_cpu(v->dirty_cpu);
+    return is_vcpu_dirty_cpu(read_atomic((unsigned int *)&v->dirty_cpu));

As per your communication with Julien I understand the cast
will go away again.

Yes, I think so.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.