[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] xen: fix a (latent) cpupool-related race during domain destroy



On Fri, 2016-07-15 at 12:36 +0200, Juergen Gross wrote:
> On 15/07/16 12:14, Dario Faggioli wrote:
> > In particular, I'm probably not fully understanding, from that
> > commit
> > changelog, what is the set of operations/command that I should run
> > to
> > check whether or not I reintroduced the issue back.
> You need to create a domain in a cpupool and destroy it again while
> some dom0 process still is holding a reference to it (resulting in a
> zombie domain). Then try to destroy the cpupool.
> 
Ah, I see. I wasn't get the fact that it needed to be a zombie domain
from anywhere.

> > What am I missing?
> The domain being a zombie domain might change the picture. Moving it
> to
> cpupool0 was failing before my patch and it might do so again with
> your
> patch applied.
> 
Mmmm... I don't immediately see the reason why moving a zombie domain
fails either, but I guess I'll have to try.

But then, correct me if I'm wrong, the situation is like this:
 - right now there's a (potential) race between domain's scheduling 
   data destruction and domain removal from a cpupool;
 - with my patch, the race goes away, but we risk not being able to 
   destroy a cpupool with a zombie domain in it.

I don't think we want to be in either of these two situations. :-(

The race is never triggering so far, but I've already patches to
Credit2 (finishing implementing soft affinity for it) that make it a
real thing.

I think I can work around that specific case by doing something like
the below:

diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index bc0e794..d91fd70 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -193,12 +193,9 @@ struct cpupool
 
 static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
 {
-    /*
-     * d->cpupool is NULL only for the idle domain, and no one should
-     * be interested in calling this for the idle domain.
-     */
-    ASSERT(d->cpupool != NULL);
-    return d->cpupool->cpu_valid;
+    /* No one should be interested in calling this for the idle domain! */
+    ASSERT(!is_idle_domain(d));
+    return d->cpupool ? d->cpupool->cpu_valid : cpupool0->cpu_valid;
 }
 
 #endif /* __XEN_SCHED_IF_H__ */

But even if that would be acceptable (what do you think?), I still
don't like to have the race there lurking. :-/

Therefore, I still think this patch is correct, but I'm up for
investigating further and finding a way to solve the "zombie in
cpupool" issue as well.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.