Xen project Mailing List

Re: [Xen-devel] [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>

Date: Mon, 12 May 2014 13:31:27 +0200

Cc: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Mon, 12 May 2014 11:31:35 +0000

Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Message-ID:Date:From:Organization:User-Agent: MIME-Version:To:CC:Subject:References:In-Reply-To: Content-Type:Content-Transfer-Encoding; b=o745pQcULrLb3ONcfwjfBtDkjfDDNmjSatyko8Q9gU/vku+A4BGkX07K ChDFmcMfop6CtFtexuh5k8wZGtS5SP5ydCkOQMvSxyS6V3/0qp+bndeP5 Zs4Kbtiy6zs8eF3le/uk/TPN+7GY6Hv1wtVogLup/dJO/cLx1cLoyfL+e III69bMHy5AS/oPQmBHh6E/WGAea61By9+GGQJGV14SAHDWW2j5WHX3YG vA+XQGSWlT9s9+flLLCWdSFJwSZjj;

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 12.05.2014 12:50, George Dunlap wrote:

On Fri, May 9, 2014 at 6:01 AM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:

On 08.05.2014 17:10, George Dunlap wrote:


On Wed, May 7, 2014 at 2:23 PM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:


On 07.05.2014 15:10, George Dunlap wrote:



On Wed, May 7, 2014 at 8:52 AM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:



When a cpupool is destroyed just after the last domain has been stopped
the
domain might already be removed from the cpupool without having
decremented
the domain count of the cpupool. This will result in rejection of the
cpupool-destroy operation.




I'm a bit confused.  What's the sched_move_domain() for, then?  If
we're going to handle "dying domains" by doing a retry, could we just
get rid of it?




The sched_move_domain() is still needed for cases where a domain stays
dying for a longer time, e.g. when a dom0 process is still referencing
some of it's memory pages. This may be a rare situation, but being unable
to use a physical cpu for another cpupool just because of this case is
worse than this little piece of code, IMO.



And I take it there are times when the move fails for whatever reason?



ENOMEM for example.

Could you add a comment explaining this above the for() loop then, for
posterity?



Could you define 'this', please? The reason for the sched_move_domain()
is mentioned in the head comment of the function (zombie domains). The
possibility of a failing sched_move_domain() is obvious by the return
value checking.


Oh, sorry -- I misunderstood the patch.  I thought you were adding
code to handle the case when sched_move_domain() failed -- but
actually, you're handling the case where you go through the
for_each_domain_in_cpupool() loop, successfully call
sched_move_domain() on each such domain (and decrement n_dom each
time), but still somehow at the end have a positive n_dom.

Do I have it right now, or am I still confused?

So there are times when a domain might not come up in the
for_each_domain_in_cpupool() loop, but for some reason still be in he
n_dom reference count?

That doesn't seem like it should be allowed to happen; and at the
moment I'm failing to see how that should happen, unless you have a
race somewhere.

Sorry if I'm just being really dense here, but from what I can tell:
* for_each_domain_in_cpupool() iterates through the domain list,
looking for domains such that d->cpupool == c
* d->cpupool is only modified when the cpupool_lock is held
* Whenever d->cpupool is modified, n_dom for the appropriate cpupools
are also modified
* In this function, you hold cpupool_lock

So how is it that you have a situation where d->cpupool != c, but
c->n_dom is counting that domain?

Sorry, my explanation above seems to be wrong, the patch is correct. I should have written the complete patch when I discovered the problem, not only the source modification (it took some time to verify the solution works). This is the correct problem description: When a domain is destroyed, it is removed from the domain_list first, then it is removed from the cpupool. So for_each_domain_in_cpupool() can miss the domain while n_dom isn't yet decremented. This scenario will happen as long as there are references to the domain. I'll update the patch accordingly. Thanks for trying to understand :-) Juergen -- Juergen Gross Principal Developer Operating Systems PSO PM&D ES&S SWE OS6 Telephone: +49 (0) 89 62060 2932 Fujitsu e-mail: juergen.gross@xxxxxxxxxxxxxx Mies-van-der-Rohe-Str. 8 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.