[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] xen/sched: fix latent races accessing vcpu->dirty_cpu

On 11.05.2020 13:28, Juergen Gross wrote:
> @@ -1956,13 +1958,17 @@ void sync_local_execstate(void)
>  void sync_vcpu_execstate(struct vcpu *v)
>  {
> -    if ( v->dirty_cpu == smp_processor_id() )
> +    unsigned int dirty_cpu = read_atomic(&v->dirty_cpu);
> +
> +    if ( dirty_cpu == smp_processor_id() )
>          sync_local_execstate();
> -    else if ( vcpu_cpu_dirty(v) )
> +    else if ( is_vcpu_dirty_cpu(dirty_cpu) )
>      {
>          /* Remote CPU calls __sync_local_execstate() from flush IPI handler. 
> */
> -        flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE);
> +        flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
>      }
> +    ASSERT(!is_vcpu_dirty_cpu(dirty_cpu) ||
> +           read_atomic(&v->dirty_cpu) != dirty_cpu);

Repeating my v1.1 comments:

"However, having stared at it for a while now - is this race
 free? I can see this being fine in the (initial) case of
 dirty_cpu == smp_processor_id(), but if this is for a foreign
 CPU, can't the vCPU have gone back to that same CPU again in
 the meantime?"

and later

"There is a time window from late in flush_mask() to the assertion
 you add. All sorts of things can happen during this window on
 other CPUs. IOW what guarantees the vCPU not getting unpaused or
 its affinity getting changed yet another time?"

You did reply that by what is now patch 2 this race can be
eliminated, but I have to admit I don't see why this would be.
Hence at the very least I'd expect justification in either the
description or a code comment as to why there's no race left
(and also no race to be expected to be re-introduced by code
changes elsewhere - very unlikely races are, by their nature,
unlikely to be hit during code development and the associated
testing, hence I'd like there to be sufficiently close to a
guarantee here).

My reservations here may in part be due to not following the
reasoning for patch 2, which therefore I'll have to rely on the
scheduler maintainers to judge on.

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -158,7 +158,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int 
> vcpu_id)
>      v->domain = d;
>      v->vcpu_id = vcpu_id;
> -    v->dirty_cpu = VCPU_CPU_CLEAN;
> +    write_atomic(&v->dirty_cpu, VCPU_CPU_CLEAN);

This is not strictly necessary (the vCPU won't be acted upon by
any other entity in the system just yet), and with this I'd like
to suggest to drop this change again.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.