Xen project Mailing List

[Xen-devel] Status of our RCU subsystem

Hey, So, because of this issue/thread, I started having a look at the RCU implementation in our tree: [Xen-devel] xen/arm: Domain not fully destroyed when using credit2 https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg02454.html https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00101.html As far as I could reconstruct, our RCU code comes from Linux >= 2.6.16 (it's the first release that contains 21a1ea9e "[PATCH] rcu batch tuning", which is something we have in xen/common/rcupdate.c). The code does not change much until when Linux started to care about preemptible RCUs. Until 2.6.21 release, Linux was a 'ticking kernel' (at least on x86, more about this later): https://en.wikipedia.org/wiki/Tickless_kernel This means there's a timer (the tick) always running and always interrupting whatever it is happening on a CPU, even if that CPU is idle (and hence, at least in modern hardware, in deep sleep states, etc). Among other things, the tick calls update_process_times (http://elixir.free-electrons.com/linux/v2.6.20/source/kernel/timer.c#L1103) which does: if (rcu_pending(cpu)) rcu_check_callbacks(cpu, user_tick); which makes RCU happy. When picking up that code, we put the above early in __do_softirq(), and it's still there. Starting from Linux 2.6.21, idle CPUs were no longer interrupted by a tick (if kernel is configured for that, as usual in Linux). That's historically been referred to as 'tickless'. And, as said, that's the case for x86, as, in fact, s390 is tickless in Linux since 2.6.6. Now, as a matter of fact, we (Xen) are tickless. Well, strictly speaking it depends on the scheduler, and even on architecture. :-( So, Credit1 is tickless since 964fae8ac "cpuidle: suspend/resume scheduler tick timer during cpu idle state entry/exit", which defined and put down calls to sched_tick_suspend/resume(). For x86. ARM never calls sched_tick_suspend(). Credit2 (and also RTDS, I'd say) are tickless since their introduction, and on all arches, because they don't have a tick. So, to summarize: x86 ARM - Credit1 tickless since 2009 ticking - Credit2, tickless tickless RTDS Our RCU implementation, as it is, is not meant to work on tickless system. So it, in theory, should *only* work with Credit1 on ARM. Until they don't add calls to sched_tick_suspend() in proper places (which they should). At that point, it won't work _anywhere_! :-O As a matter of fact (Andrew suggested time sync rendezvouses being involved), it also _happens_ to work in tickless mode in x86. While it does not work in tickless mode on ARM (this is what the original issue is all about!). *However* our RCU implementation _can_ be made to work in tickless systems, at least in theory. In fact, it did work for Linux on s390. And, as a further proof of that, when tickless mode was introduced for everyone in Linux, there were not too many changes involved. E.g., have a look at how nohz_cpu_mask is used: * before "widespread" tickless: http://elixir.free-electrons.com/linux/v2.6.20/ident/nohz_cpu_mask * after "widespread" tickless: http://elixir.free-electrons.com/linux/v2.6.21/ident/nohz_cpu_mask So, my plan would be to give it a try at introducing something like nohz_cpu_mask. In the original thread, we identified as a possible solution introducing something like rcu_idle_enter/exit(), as one can find in modern Linux RCU code. I had a look at that, and I'm not sure I still think that's the best thing to do. Current Linux RCU implementation is *hugely* different from our one (due to even more aggressive tick suppression efforts they've been up to, tree based RCUs, preemptible RCUs, etc), so it's hard to make sense of that code and map it all back to ours. The end result *might* be that we end up with things that are called rcu_idle_enter/exit(), but that actually implement the proper handling of nohz_cpu_mask, but I'm not sure about this yet. There is also another thing that is present in all Linux versions of that code, but not in our one, i.e., a field of the RCU control structure, called passed_quiesc. As of now, it looks like we could actually need that too, but again, not sure yet (and I haven't found anything about why we don't have it). Finally, there may still be the need to take care of the case when an interrupt arrives and interrupts an idle CPU, which then goes idle again. In modern Linux, that's done by rcu_irq_enter/exit(), in the old variant we took, it's done in a way (AFAIUI, at least) that I'm not sure how to adapt for Xen. Anyway, since it's been an entertaining piece of archaeological work, and since it's a bit trickier than I expected it to be, I thought I'd share these findings. I'll start trying putting the code together, but if anyone has any idea, feedback, concern, advice, whatever... they're more than welcome. :-) Thank and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.