[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying
On 12.05.2014 12:50, George Dunlap wrote: On Fri, May 9, 2014 at 6:01 AM, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote:On 08.05.2014 17:10, George Dunlap wrote:On Wed, May 7, 2014 at 2:23 PM, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote:On 07.05.2014 15:10, George Dunlap wrote:On Wed, May 7, 2014 at 8:52 AM, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote:When a cpupool is destroyed just after the last domain has been stopped the domain might already be removed from the cpupool without having decremented the domain count of the cpupool. This will result in rejection of the cpupool-destroy operation.I'm a bit confused. What's the sched_move_domain() for, then? If we're going to handle "dying domains" by doing a retry, could we just get rid of it?The sched_move_domain() is still needed for cases where a domain stays dying for a longer time, e.g. when a dom0 process is still referencing some of it's memory pages. This may be a rare situation, but being unable to use a physical cpu for another cpupool just because of this case is worse than this little piece of code, IMO.And I take it there are times when the move fails for whatever reason?ENOMEM for example.Could you add a comment explaining this above the for() loop then, for posterity?Could you define 'this', please? The reason for the sched_move_domain() is mentioned in the head comment of the function (zombie domains). The possibility of a failing sched_move_domain() is obvious by the return value checking.Oh, sorry -- I misunderstood the patch. I thought you were adding code to handle the case when sched_move_domain() failed -- but actually, you're handling the case where you go through the for_each_domain_in_cpupool() loop, successfully call sched_move_domain() on each such domain (and decrement n_dom each time), but still somehow at the end have a positive n_dom. Do I have it right now, or am I still confused? So there are times when a domain might not come up in the for_each_domain_in_cpupool() loop, but for some reason still be in he n_dom reference count? That doesn't seem like it should be allowed to happen; and at the moment I'm failing to see how that should happen, unless you have a race somewhere. Sorry if I'm just being really dense here, but from what I can tell: * for_each_domain_in_cpupool() iterates through the domain list, looking for domains such that d->cpupool == c * d->cpupool is only modified when the cpupool_lock is held * Whenever d->cpupool is modified, n_dom for the appropriate cpupools are also modified * In this function, you hold cpupool_lock So how is it that you have a situation where d->cpupool != c, but c->n_dom is counting that domain? Sorry, my explanation above seems to be wrong, the patch is correct. I should have written the complete patch when I discovered the problem, not only the source modification (it took some time to verify the solution works). This is the correct problem description: When a domain is destroyed, it is removed from the domain_list first, then it is removed from the cpupool. So for_each_domain_in_cpupool() can miss the domain while n_dom isn't yet decremented. This scenario will happen as long as there are references to the domain. I'll update the patch accordingly. Thanks for trying to understand :-) Juergen -- Juergen Gross Principal Developer Operating Systems PSO PM&D ES&S SWE OS6 Telephone: +49 (0) 89 62060 2932 Fujitsu e-mail: juergen.gross@xxxxxxxxxxxxxx Mies-van-der-Rohe-Str. 8 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |