[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] null scheduler bug
Hi Dario, On 09/25/2018 10:02 AM, Dario Faggioli wrote: On Mon, 2018-09-24 at 22:46 +0100, Julien Grall wrote:On 09/21/2018 05:20 PM, Dario Faggioli wrote:What I'm after, is how log, after domain_destroy(), complete_domain_destroy() is called, and whether/how it relates the the grace period idle timer we've added in the RCU code.NULL scheduler and vwfi=native will inevitably introduce a latency when destroying a domain. vwfi=native means the guest will not trap when it has nothing to do and switch to the idle vCPU. So, in such configuration, it is extremely unlikely the execute the idle_loop or even enter in the hypervisor unless there are an interrupt on that pCPU.Ah! I'm not familiar with wfi=native --and in fact I was completely ignoring it-- but this analysis makes sense to me.Per my understanding of call_rcu, the calls will be queued until the RCU reached a threshold. We don't have many place where call_rcu is called, so reaching the threeshold may just never happen. But nothing will tell that vCPU to go in Xen and say "I am done with RCU". Did I miss anything?Yeah, and in fact we added the timer _but_, in this case, it does not look that the timer is firing. It looks much more like "some random interrupt happens", as you're suggesting. OTOH, in the case where there are no printk()s, it might be that the timer does fire, but the vcpu has not gone through Xen, so the grace period is, as far as we know, not expired yet (which is also in accordance with Julien's analysis, as far as I understood it). The timer is only activated when sched_tick_suspend() is called. With vwfi=native, you will never reach the idle_loop() and therefore never setup a timer. Milan confirmed that guest can be destroyed with vwfi=native removed. So this is confirming my thinking. Trapping wfi will end up to switch to idle vCPU and trigger the grace period. I am not entirely sure you will be able to reproduce it on x86, but I don't think it is a Xen Arm specific. When I looked at the code, I don't see any grace period in other context than idle_loop. Rather than adding another grace period, I would just force quiescence for every call_rcu. This should not be have a big performance impact as we don't use much call_rcu and it would allow domain to be fully destroyed in timely manner. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |