[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] xen/sched: fix latent races accessing vcpu->dirty_cpu



On 14.05.2020 11:29, Jürgen Groß wrote:
> On 14.05.20 11:24, Jan Beulich wrote:
>> On 14.05.2020 10:50, Jürgen Groß wrote:
>>> On 14.05.20 09:10, Jan Beulich wrote:
>>>> On 11.05.2020 13:28, Juergen Gross wrote:
>>>>> @@ -1956,13 +1958,17 @@ void sync_local_execstate(void)
>>>>>      void sync_vcpu_execstate(struct vcpu *v)
>>>>>    {
>>>>> -    if ( v->dirty_cpu == smp_processor_id() )
>>>>> +    unsigned int dirty_cpu = read_atomic(&v->dirty_cpu);
>>>>> +
>>>>> +    if ( dirty_cpu == smp_processor_id() )
>>>>>            sync_local_execstate();
>>>>> -    else if ( vcpu_cpu_dirty(v) )
>>>>> +    else if ( is_vcpu_dirty_cpu(dirty_cpu) )
>>>>>        {
>>>>>            /* Remote CPU calls __sync_local_execstate() from flush IPI 
>>>>> handler. */
>>>>> -        flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE);
>>>>> +        flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
>>>>>        }
>>>>> +    ASSERT(!is_vcpu_dirty_cpu(dirty_cpu) ||
>>>>> +           read_atomic(&v->dirty_cpu) != dirty_cpu);
>>>>
>>>> Repeating my v1.1 comments:
>>>>
>>>> "However, having stared at it for a while now - is this race
>>>>    free? I can see this being fine in the (initial) case of
>>>>    dirty_cpu == smp_processor_id(), but if this is for a foreign
>>>>    CPU, can't the vCPU have gone back to that same CPU again in
>>>>    the meantime?"
>>>>
>>>> and later
>>>>
>>>> "There is a time window from late in flush_mask() to the assertion
>>>>    you add. All sorts of things can happen during this window on
>>>>    other CPUs. IOW what guarantees the vCPU not getting unpaused or
>>>>    its affinity getting changed yet another time?"
>>>>
>>>> You did reply that by what is now patch 2 this race can be
>>>> eliminated, but I have to admit I don't see why this would be.
>>>> Hence at the very least I'd expect justification in either the
>>>> description or a code comment as to why there's no race left
>>>> (and also no race to be expected to be re-introduced by code
>>>> changes elsewhere - very unlikely races are, by their nature,
>>>> unlikely to be hit during code development and the associated
>>>> testing, hence I'd like there to be sufficiently close to a
>>>> guarantee here).
>>>>
>>>> My reservations here may in part be due to not following the
>>>> reasoning for patch 2, which therefore I'll have to rely on the
>>>> scheduler maintainers to judge on.
>>>
>>> sync_vcpu_execstate() isn't called for a running or runnable vcpu any
>>> longer. I can add an ASSERT() and a comment explaining it if you like
>>> that better.
>>
>> This would help (hopefully people adding new uses of the function
>> would run into this assertion/comment), but for example the uses
>> in mapcache_current_vcpu() or do_tasklet_work() look to be pretty
>> hard to prove they can't happen for a runnable vCPU.
> 
> Those call sync_local_execstate(), not sync_vcpu_execstate().

Ouch, as said on the other sub-thread - I'm sorry for mixing those up.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.