[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null scheduler bug

Hi Dario,

On 09/25/2018 10:02 AM, Dario Faggioli wrote:
On Mon, 2018-09-24 at 22:46 +0100, Julien Grall wrote:
On 09/21/2018 05:20 PM, Dario Faggioli wrote:

What I'm after, is how log, after domain_destroy(),
complete_domain_destroy() is called, and whether/how it relates the
grace period idle timer we've added in the RCU code.

NULL scheduler and vwfi=native will inevitably introduce a latency
destroying a domain. vwfi=native means the guest will not trap when
has nothing to do and switch to the idle vCPU. So, in such
configuration, it is extremely unlikely the execute the idle_loop or
even enter in the hypervisor unless there are an interrupt on that

Ah! I'm not familiar with wfi=native --and in fact I was completely
ignoring it-- but this analysis makes sense to me.

Per my understanding of call_rcu, the calls will be queued until the
reached a threshold. We don't have many place where call_rcu is
so reaching the threeshold may just never happen. But nothing will
that vCPU to go in Xen and say "I am done with RCU". Did I miss

Yeah, and in fact we added the timer _but_, in this case, it does not
look that the timer is firing. It looks much more like "some random
interrupt happens", as you're suggesting. OTOH, in the case where there
are no printk()s, it might be that the timer does fire, but the vcpu
has not gone through Xen, so the grace period is, as far as we know,
not expired yet (which is also in accordance with Julien's analysis, as
far as I understood it).

The timer is only activated when sched_tick_suspend() is called. With vwfi=native, you will never reach the idle_loop() and therefore never setup a timer.

Milan confirmed that guest can be destroyed with vwfi=native removed. So this is confirming my thinking. Trapping wfi will end up to switch to idle vCPU and trigger the grace period.

I am not entirely sure you will be able to reproduce it on x86, but I don't think it is a Xen Arm specific.

When I looked at the code, I don't see any grace period in other context than idle_loop. Rather than adding another grace period, I would just force quiescence for every call_rcu.

This should not be have a big performance impact as we don't use much call_rcu and it would allow domain to be fully destroyed in timely manner.


Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.