[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen/arm: Domain not fully destroyed when using credit2
On 24/01/17 13:04, Julien Grall wrote: Hi Dario, On 24/01/17 12:53, Dario Faggioli wrote:On Tue, 2017-01-24 at 10:50 +0000, Julien Grall wrote:On 24/01/2017 08:20, Jan Beulich wrote:On 23.01.17 at 20:42, <julien.grall@xxxxxxx> wrote:The function domain_destroy will setup the RCU callback (complete_domain_destroy) by calling call_rcu. call_rcu will add the callback into the RCU list and then will may send an IPI (see force_quiescent_state) if the threshold reached. This IPI is here to make sure all CPUs are quiescent before calling the callbacks (e.g complete_domain_destroy). In my case, the threshold has not reached and therefore an IPI is not sent.But wait - isn't it the nature of RCU that it may take arbitrary time until the actual call(s) happen(s)?Today this arbitrary time could be infinite if an idle pCPU does not receive an interrupt. So some part of domain resource will never be freed. If I am power-cycling a domain in loop, after some time the toolstack will fail to allocate memory because of exhausted resources. Previous instance of the domain was not yet fully destroyed (e.g complete_domain_destroy was not called).Do you have a script and/or some more info for letting me try to reproduce it (e.g., you say some otf the vCPUs are pinned, which one? etc)?That was mentioned in my first e-mail :). My configuration is: - ARM platform with 6 cores - staging Xen with credit2 enabled by default - DOM0 using 2 pinned vCPUs To clarify here, DOM0 has only 2 vCPUs. Both are pinned. - Guest using 2 vCPUs (not pinned) The script is really simple: for i in `seq 1 10`; do sudo xl create ~/works/guest/guest.cfg; sudo xl destroy guest; doneI'm a bit curious about why you're saying this is being exposed by using Credit2.It is been exposed by Credit2 because compared to Credit1 there is no interrupt traffic made by the scheduler. On ARM with credit2 the interrupt traffic is reduced to none for idle pCPU. In fact:1) I've power-cycled quite a few domains in these last months, while under Credit2, and I don't think I have encountered it on x86;AFAIU, IPI is often the only way to broadcast some instruction on x86. So compare to ARM, you have likely an higher interrupt traffic. Also, the problem is not obvious to spot unless you look at the free memory (via xl info) before and after. Another solution is printing a message in both domain_destroy and complete_domain_destroy. You will spot the first message directly. The latter may never be printed.2) I see how it may be related to Credit2 being more deterministic and not trying to schedule stuff around pseudo-randomly like Credit1 does... but I'd like to try investigating a bit more.I am able to reliable reproduce on a Juno-r2. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |