[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen/arm: Domain not fully destroyed when using credit2



On Mon, 23 Jan 2017, Julien Grall wrote:
> Hi all,
> 
> Before someone dig into the scheduler, I don't think this is an issue in
> credit2 but the use of it highlight a bug in another component (I think RCU).
> 
> Whilst testing other patches today, I have noticed that some part of the
> resources allocated to a guest were not released during the destruction.
> 
> The configuration of the test is:
>       - ARM platform with 6 cores
>       - staging Xen with credit2 enabled by default
>       - DOM0 using 2 pinned vCPUs
> 
> The test is creating a guest vCPUs and then destroyed. After the test, some
> resourced are not released (or could be released a long time
> after).
> 
> Looking at the code, domain resources are released in 2 phases:
>       - domain_destroy: called when there is no more reference on the domain
> (see put_domain)
>       - complete_domain_destroy: called when the RCU is quiescent
> 
> The function domain_destroy will setup the RCU callback
> (complete_domain_destroy) by calling call_rcu. call_rcu will add the callback
> into the RCU list and then will may send an IPI (see force_quiescent_state) if
> the threshold reached. This IPI is here to make sure all CPUs are quiescent
> before calling the callbacks (e.g complete_domain_destroy). In my case, the
> threshold has not reached and therefore an IPI is not sent.
> 
> On ARM, the idle will run when the pCPU has no work to do. This loop will wait
> to receive an interrupt (see wfi) and check if there is some work to do when
> the CPU has waken-up (i.e an interrupt was received).
> 
> The problem I encountered is the idle CPU will never receive interrupts (no
> timer, nor IPI...) and therefore never check whether the RCU has some work to
> do.
> 
> From my understanding, this is a bug in how RCU is handled (see comment above
> rcu_start_batch), it expects each CPU (no broadcast) to check whether there is
> RCU work. But this is relying on someone else (timer?) to fire an interrupt.
> 
> Any incoming interrupts will make a pCPU checking the RCU state. On ARM, the
> biggest source of IPI was credit1 or timer if a guest vCPU was scheduled on
> that pCPU. But it looks like the IPI traffic with credit2 was reduced to none
> (which is a really good thing :)), and no guest timer was scheduled because no
> vCPU ever run on this pCPU.
> 
> I think the bug has always been here (both ARM and x86), but never detected
> because any incoming interrupts will make the pCPU to check the RCU state.
> 
> However, I am not sure how to resolve this issue. Any thoughts?

Well done for finding the bug!

Sending an IPI on call_rcu is easy, but it would be better not to wake
up the sleeping cpus at all. If they are running the idle_loop, they
cannot be holding any rcu references for the domain which is about to be
destroyed, right?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.