[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy
On 14/07/16 07:41, Dario Faggioli wrote: > So, during domain destruction, we do: > cpupool_rm_domain() [ in domain_destroy() ] > sched_destroy_domain() [ in complete_domain_destroy() ] > > Therefore, there's a window during which, from the > scheduler's point of view, a domain is still there, but > without it being part of any cpupool. > > In fact, cpupool_rm_domain() does d->cpupool=NULL, > and we don't allow anything like that to hold, for > any domain with the only exception of the idle one. > And if we stuble upon such a domain, there are ASSERT()s > and BUG_ON()s that do trigger. > > This never happens, right now, but only because none > of the functions containing one of those sanity checks > are called during the above described window. However, > Credit2 goes (during load balancing) through the list > of domains assigned to a certain runqueue, and not only > the ones that are running or runnable. If one of those > domains had cpupool_rm_domain() called upon itself > already, and we call one of those functions which checks > for d->cpupool!=NULL... Boom! > > For example, while doing Credit2 development, calling > something similar to __vcpu_has_soft_affinity() from > balance_load(), makes `xl shutdown <domid>' reliably > crash (this is how I discovered this). > > On the other hand, cpupool_rm_domain() "only" does > cpupool related bookkeeping, and there's no harm > postponing it a little bit. > > Finally, considering that, during domain initialization, > we do: > cpupool_add_domain() > sched_init_domain() > > It looks like it makes much more sense for the domain > destroy path to look like the opposite of it, i.e.: > sched_destroy_domain() > cpupool_rm_domain() > > This patch does that, and it's considered worth, as it > fixes a bug, even if only a latent one. > > Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> As the cpupool bookkeeping is very closely related to the scheduler bookkeeping, how about having the sched_*_domain() functions involve the cpupool_*_domain() functions? That way it can't get out-of-order like this in the future. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |