[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
  • Date: Wed, 14 May 2014 12:35:07 +0200
  • Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Wed, 14 May 2014 10:35:21 +0000
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Message-ID:Date:From:Organization:User-Agent: MIME-Version:To:CC:Subject:References:In-Reply-To: Content-Type:Content-Transfer-Encoding; b=rpQ4x1CJezi0JQwNVZRAZsv1MT1bUJUFegJ+dDExUZdJZFYjthNk11Sl 9zbPXaU06cZMdUY/+8GV13Bf5WBrgIGk7ol7z23vy8y060IuEyD1RwKW1 Rzez09IONrrqDe2ilPd6pIoaFUiUQlpmqm0MU6O9NiI3/STCJ8GpUoLVV ogmpgmLZNMRIZaF6J1yVhnQYxcMkvnD8A72pSbfGrIij0UC9Ijy/QFqJ0 09IsmVjN4KWu+9Brf+b3fr6D+NB0y;
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 14.05.2014 12:15, Jan Beulich wrote:
On 14.05.14 at 11:56, <juergen.gross@xxxxxxxxxxxxxx> wrote:
On 14.05.2014 11:16, George Dunlap wrote:
On Mon, May 12, 2014 at 12:49 PM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
When a cpupool is destroyed just after the last domain has been stopped the
domain might already be removed from the domain list without being removed
from the cpupool.
It is easy to detect this situation and to return EAGAIN in this case which
is already handled in libxc by doing a retry.

OK, I hate to be picky over two lines, but it still seems to me like
this is papering over issues instead of dealing with them properly.
The real problem here is that "for_each_domain_in_cpupool()" doesn't
actually go over every domain in the cpupool.  Instead of making it so
that it actually does, you're compensating for that fact in an ad-hoc
fashion.

Now as it happens, it looks like all the other current uses of
for_each_domain_in_cpupool() work just fine if there are domains in
the pool it doesn't see, as long as they're about to disappear.  But
we've already seen a bug caused because of a situation where "don't
see domains that are about to disappear" *does* actually cause a
problem; working around it is just setting a trap for future
developers to fall into.  (And who knows, there may already be a bug
we haven't discovered in the other invocations of
for_each_domain_in_cpupool()).

This isn't unique to for_each_domain_in_cpupool(). It is a problem for all
uses of for_each_domain() which are related to resources freed only in
complete_domain_destroy().

Of which there shouldn't be that many, if any at all.

What prevents cpupool_rm_domain() getting moved from
complete_domain_destroy() to domain_destroy(), before the
domain gets taken off the list? I actually assume that there are
more things here that may not really need deferring until the
last possible moment...

sched_destroy_vcpu() and sched_destroy_domain() have to happen before
cpupool_rm_domain(). This could be avoided if the domain would be moved to
cpupool0 in domain_destroy().

Hmm, doesn't sound too bad. This would be just symmetrical to domain
creation. What do you think?


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
PSO PM&D ES&S SWE OS6                  Telephone: +49 (0) 89 62060 2932
Fujitsu                                   e-mail: juergen.gross@xxxxxxxxxxxxxx
Mies-van-der-Rohe-Str. 8                Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.