[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying
On Fri, May 9, 2014 at 6:01 AM, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote: > On 08.05.2014 17:10, George Dunlap wrote: >> >> On Wed, May 7, 2014 at 2:23 PM, Juergen Gross >> <juergen.gross@xxxxxxxxxxxxxx> wrote: >>> >>> On 07.05.2014 15:10, George Dunlap wrote: >>>> >>>> >>>> On Wed, May 7, 2014 at 8:52 AM, Juergen Gross >>>> <juergen.gross@xxxxxxxxxxxxxx> wrote: >>>>> >>>>> >>>>> When a cpupool is destroyed just after the last domain has been stopped >>>>> the >>>>> domain might already be removed from the cpupool without having >>>>> decremented >>>>> the domain count of the cpupool. This will result in rejection of the >>>>> cpupool-destroy operation. >>>> >>>> >>>> >>>> I'm a bit confused. What's the sched_move_domain() for, then? If >>>> we're going to handle "dying domains" by doing a retry, could we just >>>> get rid of it? >>> >>> >>> >>> The sched_move_domain() is still needed for cases where a domain stays >>> dying for a longer time, e.g. when a dom0 process is still referencing >>> some of it's memory pages. This may be a rare situation, but being unable >>> to use a physical cpu for another cpupool just because of this case is >>> worse than this little piece of code, IMO. >> >> >> And I take it there are times when the move fails for whatever reason? > > > ENOMEM for example. > > >> Could you add a comment explaining this above the for() loop then, for >> posterity? > > > Could you define 'this', please? The reason for the sched_move_domain() > is mentioned in the head comment of the function (zombie domains). The > possibility of a failing sched_move_domain() is obvious by the return > value checking. Oh, sorry -- I misunderstood the patch. I thought you were adding code to handle the case when sched_move_domain() failed -- but actually, you're handling the case where you go through the for_each_domain_in_cpupool() loop, successfully call sched_move_domain() on each such domain (and decrement n_dom each time), but still somehow at the end have a positive n_dom. Do I have it right now, or am I still confused? So there are times when a domain might not come up in the for_each_domain_in_cpupool() loop, but for some reason still be in he n_dom reference count? That doesn't seem like it should be allowed to happen; and at the moment I'm failing to see how that should happen, unless you have a race somewhere. Sorry if I'm just being really dense here, but from what I can tell: * for_each_domain_in_cpupool() iterates through the domain list, looking for domains such that d->cpupool == c * d->cpupool is only modified when the cpupool_lock is held * Whenever d->cpupool is modified, n_dom for the appropriate cpupools are also modified * In this function, you hold cpupool_lock So how is it that you have a situation where d->cpupool != c, but c->n_dom is counting that domain? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |