[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v2 1/2] xen: fix a (latent) cpupool-related race during domain destroy
So, during domain destruction, we do: cpupool_rm_domain() [ in domain_destroy() ] sched_destroy_domain() [ in complete_domain_destroy() ] Therefore, there's a window during which, from the scheduler's point of view, a domain stilsts outside of any cpupool. In fact, cpupool_rm_domain() does d->cpupool=NULL, and we don't allow that to hold true, for anything but the idle domain (and there are, in fact, ASSERT()s and BUG_ON()s to that effect). Currently, we never really check d->cpupool during the window, but that does not mean the race is not there. For instance, Credit2 at some point (during load balancing) iterates on the list of domains, and if we add logic that needs checking d->cpupool, and any one of them had cpupool_rm_domain() called on itself already... Boom! (In fact, calling __vcpu_has_soft_affinity() from inside balance_load() makes `xl shutdown <domid>' reliably crash, and this is how I discovered this.) On the other hand, cpupool_rm_domain() "only" does cpupool related bookkeeping, and there's no harm postponing it a little bit. Also, considering that, during domain initialization, we do: cpupool_add_domain() sched_init_domain() It makes sense for the destruction path to look like the opposite of it, i.e.: sched_destroy_domain() cpupool_rm_domain() And hence that's what this patch does. Actually, for better robustness, what we really do is moving both cpupool_add_domain() and cpupool_rm_domain() inside sched_init_domain() and sched_destroy_domain(), respectively (and also add a couple of ASSERT()-s). Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> --- Cc: Juergen Gross <jgross@xxxxxxxx> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx> Cc: Jan Beulich <jbeulich@xxxxxxxx> --- Changes from v1: * followed Andrew's suggestion of moving the cpupool functions inside the sched functions; * reworded the changelog. --- xen/common/domain.c | 7 +------ xen/common/schedule.c | 13 ++++++++++++- xen/include/xen/sched.h | 2 +- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index 42c07ee..339ee56 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -379,10 +379,7 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags, goto fail; init_status |= INIT_arch; - if ( (err = cpupool_add_domain(d, poolid)) != 0 ) - goto fail; - - if ( (err = sched_init_domain(d)) != 0 ) + if ( (err = sched_init_domain(d, poolid)) != 0 ) goto fail; if ( (err = late_hwdom_init(d)) != 0 ) @@ -868,8 +865,6 @@ void domain_destroy(struct domain *d) TRACE_1D(TRC_DOM0_DOM_REM, d->domain_id); - cpupool_rm_domain(d); - /* Delete from task list and task hashtable. */ spin_lock(&domlist_update_lock); pd = &domain_list; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 7ac12d3..852f840 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -379,8 +379,15 @@ void sched_destroy_vcpu(struct vcpu *v) SCHED_OP(VCPU2OP(v), free_vdata, v->sched_priv); } -int sched_init_domain(struct domain *d) +int sched_init_domain(struct domain *d, int poolid) { + int ret; + + ASSERT(d->cpupool == NULL); + + if ( (ret = cpupool_add_domain(d, poolid)) ) + return ret; + SCHED_STAT_CRANK(dom_init); TRACE_1D(TRC_SCHED_DOM_ADD, d->domain_id); return SCHED_OP(DOM2OP(d), init_domain, d); @@ -388,9 +395,13 @@ int sched_init_domain(struct domain *d) void sched_destroy_domain(struct domain *d) { + ASSERT(d->cpupool != NULL || is_idle_domain(d)); + SCHED_STAT_CRANK(dom_destroy); TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id); SCHED_OP(DOM2OP(d), destroy_domain, d); + + cpupool_rm_domain(d); } void vcpu_sleep_nosync(struct vcpu *v) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 46c82e7..888bc19 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -637,7 +637,7 @@ void noreturn asm_domain_crash_synchronous(unsigned long addr); void scheduler_init(void); int sched_init_vcpu(struct vcpu *v, unsigned int processor); void sched_destroy_vcpu(struct vcpu *v); -int sched_init_domain(struct domain *d); +int sched_init_domain(struct domain *d, int poolid); void sched_destroy_domain(struct domain *d); int sched_move_domain(struct domain *d, struct cpupool *c); long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |