[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy



On 14/07/16 07:41, Dario Faggioli wrote:
> So, during domain destruction, we do:
>  cpupool_rm_domain()    [ in domain_destroy() ]
>  sched_destroy_domain() [ in complete_domain_destroy() ]
>
> Therefore, there's a window during which, from the
> scheduler's point of view, a domain is still there, but
> without it being part of any cpupool.
>
> In fact, cpupool_rm_domain() does d->cpupool=NULL,
> and we don't allow anything like that to hold, for
> any domain with the only exception of the idle one.
> And if we stuble upon such a domain, there are ASSERT()s
> and BUG_ON()s that do trigger.
>
> This never happens, right now, but only because none
> of the functions containing one of those sanity checks
> are called during the above described window. However,
> Credit2 goes (during load balancing) through the list
> of domains assigned to a certain runqueue, and not only
> the ones that are running or runnable. If one of those
> domains had cpupool_rm_domain() called upon itself
> already, and we call one of those functions which checks
> for d->cpupool!=NULL... Boom!
>
> For example, while doing Credit2 development, calling
> something similar to __vcpu_has_soft_affinity() from
> balance_load(), makes `xl shutdown <domid>' reliably
> crash (this is how I discovered this).
>
> On the other hand, cpupool_rm_domain() "only" does
> cpupool related bookkeeping, and there's no harm
> postponing it a little bit.
>
> Finally, considering that, during domain initialization,
> we do:
>  cpupool_add_domain()
>  sched_init_domain()
>
> It looks like it makes much more sense for the domain
> destroy path to look like the opposite of it, i.e.:
>  sched_destroy_domain()
>  cpupool_rm_domain()
>
> This patch does that, and it's considered worth, as it
> fixes a bug, even if only a latent one.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

As the cpupool bookkeeping is very closely related to the scheduler
bookkeeping, how about having the sched_*_domain() functions involve the
cpupool_*_domain() functions?

That way it can't get out-of-order like this in the future.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.