[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy



So, during domain destruction, we do:
 cpupool_rm_domain()    [ in domain_destroy() ]
 sched_destroy_domain() [ in complete_domain_destroy() ]

Therefore, there's a window during which, from the
scheduler's point of view, a domain is still there, but
without it being part of any cpupool.

In fact, cpupool_rm_domain() does d->cpupool=NULL,
and we don't allow anything like that to hold, for
any domain with the only exception of the idle one.
And if we stuble upon such a domain, there are ASSERT()s
and BUG_ON()s that do trigger.

This never happens, right now, but only because none
of the functions containing one of those sanity checks
are called during the above described window. However,
Credit2 goes (during load balancing) through the list
of domains assigned to a certain runqueue, and not only
the ones that are running or runnable. If one of those
domains had cpupool_rm_domain() called upon itself
already, and we call one of those functions which checks
for d->cpupool!=NULL... Boom!

For example, while doing Credit2 development, calling
something similar to __vcpu_has_soft_affinity() from
balance_load(), makes `xl shutdown <domid>' reliably
crash (this is how I discovered this).

On the other hand, cpupool_rm_domain() "only" does
cpupool related bookkeeping, and there's no harm
postponing it a little bit.

Finally, considering that, during domain initialization,
we do:
 cpupool_add_domain()
 sched_init_domain()

It looks like it makes much more sense for the domain
destroy path to look like the opposite of it, i.e.:
 sched_destroy_domain()
 cpupool_rm_domain()

This patch does that, and it's considered worth, as it
fixes a bug, even if only a latent one.

Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
---
Cc: Juergen Gross <jgross@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Cc: Jan Beulich <jbeulich@xxxxxxxx>
---
 xen/common/domain.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 42c07ee..f8096d3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -811,6 +811,8 @@ static void complete_domain_destroy(struct rcu_head *head)
         destroy_waitqueue_vcpu(v);
     }
 
+    cpupool_rm_domain(d);
+
     grant_table_destroy(d);
 
     arch_domain_destroy(d);
@@ -868,8 +870,6 @@ void domain_destroy(struct domain *d)
 
     TRACE_1D(TRC_DOM0_DOM_REM, d->domain_id);
 
-    cpupool_rm_domain(d);
-
     /* Delete from task list and task hashtable. */
     spin_lock(&domlist_update_lock);
     pd = &domain_list;


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.