[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen/arm: Domain not fully destroyed when using credit2
On Mon, 23 Jan 2017, Julien Grall wrote: > Hi all, > > Before someone dig into the scheduler, I don't think this is an issue in > credit2 but the use of it highlight a bug in another component (I think RCU). > > Whilst testing other patches today, I have noticed that some part of the > resources allocated to a guest were not released during the destruction. > > The configuration of the test is: > - ARM platform with 6 cores > - staging Xen with credit2 enabled by default > - DOM0 using 2 pinned vCPUs > > The test is creating a guest vCPUs and then destroyed. After the test, some > resourced are not released (or could be released a long time > after). > > Looking at the code, domain resources are released in 2 phases: > - domain_destroy: called when there is no more reference on the domain > (see put_domain) > - complete_domain_destroy: called when the RCU is quiescent > > The function domain_destroy will setup the RCU callback > (complete_domain_destroy) by calling call_rcu. call_rcu will add the callback > into the RCU list and then will may send an IPI (see force_quiescent_state) if > the threshold reached. This IPI is here to make sure all CPUs are quiescent > before calling the callbacks (e.g complete_domain_destroy). In my case, the > threshold has not reached and therefore an IPI is not sent. > > On ARM, the idle will run when the pCPU has no work to do. This loop will wait > to receive an interrupt (see wfi) and check if there is some work to do when > the CPU has waken-up (i.e an interrupt was received). > > The problem I encountered is the idle CPU will never receive interrupts (no > timer, nor IPI...) and therefore never check whether the RCU has some work to > do. > > From my understanding, this is a bug in how RCU is handled (see comment above > rcu_start_batch), it expects each CPU (no broadcast) to check whether there is > RCU work. But this is relying on someone else (timer?) to fire an interrupt. > > Any incoming interrupts will make a pCPU checking the RCU state. On ARM, the > biggest source of IPI was credit1 or timer if a guest vCPU was scheduled on > that pCPU. But it looks like the IPI traffic with credit2 was reduced to none > (which is a really good thing :)), and no guest timer was scheduled because no > vCPU ever run on this pCPU. > > I think the bug has always been here (both ARM and x86), but never detected > because any incoming interrupts will make the pCPU to check the RCU state. > > However, I am not sure how to resolve this issue. Any thoughts? Well done for finding the bug! Sending an IPI on call_rcu is easy, but it would be better not to wake up the sleeping cpus at all. If they are running the idle_loop, they cannot be holding any rcu references for the domain which is about to be destroyed, right? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |