[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Status of our RCU subsystem



Hey,

So, because of this issue/thread, I started having a look at the RCU
implementation in our tree:
 [Xen-devel] xen/arm: Domain not fully destroyed when using credit2
 https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg02454.html
 https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00101.html

As far as I could reconstruct, our RCU code comes from Linux >= 2.6.16
(it's the first release that contains 21a1ea9e "[PATCH] rcu batch
tuning", which is something we have in xen/common/rcupdate.c).

The code does not change much until when Linux started to care about
preemptible RCUs.

Until 2.6.21 release, Linux was a 'ticking kernel' (at least on x86,
more about this later):
 https://en.wikipedia.org/wiki/Tickless_kernel

This means there's a timer (the tick) always running and always
interrupting whatever it is happening on a CPU, even if that CPU is
idle (and hence, at least in modern hardware, in deep sleep states,
etc). Among other things, the tick calls update_process_times
(http://elixir.free-electrons.com/linux/v2.6.20/source/kernel/timer.c#L1103)
which does:

    if (rcu_pending(cpu))
        rcu_check_callbacks(cpu, user_tick);

which makes RCU happy.

When picking up that code, we put the above early in __do_softirq(),
and it's still there.

Starting from Linux 2.6.21, idle CPUs were no longer interrupted by a
tick (if kernel is configured for that, as usual in Linux). That's
historically been referred to as 'tickless'.

And, as said, that's the case for x86, as, in fact, s390 is tickless in
Linux since 2.6.6.

Now, as a matter of fact, we (Xen) are tickless. Well, strictly
speaking it depends on the scheduler, and even on architecture. :-(

So, Credit1 is tickless since 964fae8ac "cpuidle: suspend/resume
scheduler tick timer during cpu idle state entry/exit", which defined
and put down calls to sched_tick_suspend/resume().
For x86.
ARM never calls sched_tick_suspend().
Credit2 (and also RTDS, I'd say) are tickless since their introduction,
and on all arches, because they don't have a tick.

So, to summarize:

                 x86                     ARM
 - Credit1      tickless since 2009     ticking
 - Credit2,     tickless                tickless
   RTDS

Our RCU implementation, as it is, is not meant to work on tickless
system. So it, in theory, should *only* work with Credit1 on ARM. Until
they don't add calls to sched_tick_suspend() in proper places (which
they should). At that point, it won't work _anywhere_! :-O
As a matter of fact (Andrew suggested time sync rendezvouses being
involved), it also _happens_ to work in tickless mode in x86. While it
does not work in tickless mode on ARM (this is what the original issue
is all about!).

*However* our RCU implementation _can_ be made to work in tickless
systems, at least in theory. In fact, it did work for Linux on s390.
And, as a further proof of that, when tickless mode was introduced for
everyone in Linux, there were not too many changes involved. E.g., have
a look at how nohz_cpu_mask is used:

 * before "widespread" tickless:
   http://elixir.free-electrons.com/linux/v2.6.20/ident/nohz_cpu_mask
 * after "widespread" tickless:
   http://elixir.free-electrons.com/linux/v2.6.21/ident/nohz_cpu_mask

So, my plan would be to give it a try at introducing something like
nohz_cpu_mask.

In the original thread, we identified as a possible solution
introducing something like rcu_idle_enter/exit(), as one can find in
modern Linux RCU code. I had a look at that, and I'm not sure I still
think that's the best thing to do. Current Linux RCU implementation is
*hugely* different from our one (due to even more aggressive tick
suppression efforts they've been up to, tree based RCUs, preemptible
RCUs, etc), so it's hard to make sense of that code and map it all back
to ours.

The end result *might* be that we end up with things that are called
rcu_idle_enter/exit(), but that actually implement the proper handling
of nohz_cpu_mask, but I'm not sure about this yet.

There is also another thing that is present in all Linux versions of
that code, but not in our one, i.e., a field of the RCU control
structure, called passed_quiesc. As of now, it looks like we could
actually need that too, but again, not sure yet (and I haven't found
anything about why we don't have it).

Finally, there may still be the need to take care of the case when an
interrupt arrives and interrupts an idle CPU, which then goes idle
again. In modern Linux, that's done by rcu_irq_enter/exit(), in the old
variant we took, it's done in a way (AFAIUI, at least) that I'm not
sure how to adapt for Xen.

Anyway, since it's been an entertaining piece of archaeological work,
and since it's a bit trickier than I expected it to be, I thought I'd
share these findings.

I'll start trying putting the code together, but if anyone has any
idea, feedback, concern, advice, whatever... they're more than welcome.
:-)

Thank and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.