[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen/arm: Domain not fully destroyed when using credit2
>>> On 24.01.17 at 11:50, <julien.grall@xxxxxxx> wrote: > On 24/01/2017 08:20, Jan Beulich wrote: >>>>> On 23.01.17 at 20:42, <julien.grall@xxxxxxx> wrote: >>> Whilst testing other patches today, I have noticed that some part of the >>> resources allocated to a guest were not released during the destruction. >>> >>> The configuration of the test is: >>> - ARM platform with 6 cores >>> - staging Xen with credit2 enabled by default >>> - DOM0 using 2 pinned vCPUs >>> >>> The test is creating a guest vCPUs and then destroyed. After the test, >>> some resourced are not released (or could be released a long time >>> after). >>> >>> Looking at the code, domain resources are released in 2 phases: >>> - domain_destroy: called when there is no more reference on the domain >>> (see put_domain) >>> - complete_domain_destroy: called when the RCU is quiescent >>> >>> The function domain_destroy will setup the RCU callback >>> (complete_domain_destroy) by calling call_rcu. call_rcu will add the >>> callback into the RCU list and then will may send an IPI (see >>> force_quiescent_state) if the threshold reached. This IPI is here to >>> make sure all CPUs are quiescent before calling the callbacks (e.g >>> complete_domain_destroy). In my case, the threshold has not reached and >>> therefore an IPI is not sent. >> >> But wait - isn't it the nature of RCU that it may take arbitrary time >> until the actual call(s) happen(s)? > > Today this arbitrary time could be infinite if an idle pCPU does not > receive an interrupt. So some part of domain resource will never be freed. > > If I am power-cycling a domain in loop, after some time the toolstack > will fail to allocate memory because of exhausted resources. Previous > instance of the domain was not yet fully destroyed (e.g > complete_domain_destroy was not called). > >> If an upper limit is required by >> a user of RCU, I think it would need to be that entity to arrange >> for early expiry. > > This is happening with all the user and not only a domain. Looking at > the code, there are already some upper limit: > - call_rcu will call force_quiescent_state if the number of element in > the RCU queue is > 10000 > - the RCU has a grace period (not sure how long), but no timer to > ensure the RCU will be called This remark in parentheses is quite relevant here, I think: There simply is no upper bound, aiui. This is a conceptional aspect. But I'm in no way an RCU expert, so I may well be entirely off. > Reducing the threshold in call_rcu (see qhimark) will not help as you > may still face memory exhaustion (see above). So I think the only best > solution is to actually implement properly the grace period. Well, with the above in mind - what does "properly" mean here? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |