[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.



On Wed, 2017-08-09 at 12:38 +0100, Tim Deegan wrote:
> Hi,
> 
> At 10:01 +0200 on 27 Jul (1501149684), Dario Faggioli wrote:
> > In Xen, that is impossible, and that's particularly problematic
> > when system is idle (or lightly loaded) systems, as CPUs that
> > are idle may never have the chance to tell RCU about their
> > quiescence, and grace periods could extend indefinitely!
> 
> [...]
> > The first step for fixing this situation is for RCU to record,
> > at the beginning of a grace period, which CPUs are already idle.
> > In fact, being idle, they can't be in the middle of any read-side
> > critical section, and we don't have to wait for them to declare
> > a grace period finished.
> 
> AIUI this patch fixes a bug where:
>  - a CPU is idle/asleep;
>  - it is added to the cpumask of a new RCU grace period; and
>  - because the CPU is asleep, the grace period never ends. 
> Have I understood?
> 
> I think we might be left with a race condition:
>  - CPU A is about to idle.  It runs the RCU softirq and
>    clears itself from the current grace period.
>  - CPU B ends the grace period and starts a new one.
>  - CPU A calls rcu_idle_enter() and sleeps.
>  - The new grace period never ends.
> 
So, I could not make this happen artificially, but I put together a
possible sequence of events that depicts this situation. I'm putting it
here, but also pasting it at https://pastebin.com/PHu6Guq0 (in case the
email body is mangled.

Lines starting with a '|' are the output of tracing, triggered by the
execution of the code/functions reported in the other lines:

        CPU0                            CPU1                            CPU2
        .                               .                               .
        .                               call_rcu(...)                   .
        .                               |rcu_call fn=xxx, rcp_cur=-300, 
rdp_quiescbatch=-300
        .                               .                               .
        .                               |do_softirq                     .
        .                               |rcu_pending? yes (ret=2): no pending 
entries but new entries
        .                               |raise_softirq nr 3             .
        .                               |softirq_handler nr 3           .
        .                               rcu_process_callbacks();        .
        .                               |rcu_process_callbacks, 
rcp_completed=-300, rdp_batch=0, rdp_curlist: null, rdp_nxtlist: yes, 
rdp_donelist: null
        .                                 rdp->batch = rcp->cur + 1;    .
        .                                 smp_rmb;                      .
        .                               |rcu_next_batch, rcp_cur=-300, 
rdp_batch=-299, rcp_next_pending? no
        .                                 if (!rcp->next_pending) {     .
        .                                  rcp->next_pending = 1;       .
        .                                  rcu_start_batch();           .
        .                                    if (rcp->next_pending && 
rcp->completed == rcp->cur) {
        .                                     rcp->next_pending = 0;    .
        .                                     smp_wmb;                  .
        .                                     rcp->cur++;               .
        .                                     smp_mb;                   .
        .                                     cpumask_andnot(&rcp->cpumask, 
&cpu_online_map, &rcp->idle_cpumask);
        .                               |rcu_start_batch, rcp_cur=-299, 
rcp_cpumask=0x00000006 //Active CPUs: 1,2
        .                                 }                             .
        .                                 rcu_check_quiescent_state();  .
        .                               |rcu_check_quiesc_state, rcp_cur=-299, 
rdp_quiescbatch=-300, rdp_qs_pending: no
        .                                   if (rdp->quiescbatch != rcp->cur) {
        .                                    rdp->qs_pending = 1;       .
        .                                    rdp->quiescbatch = rcp->cur;
        .                               |rcu_grace_start, rcp_cur=-299  .
        .                                   }                           .
        call_rcu(...)                   .                               .
        |rcu_call fn=xxx, rcp_cur=-299, rdp_quiescbatch=-300            .
        .                               .                               .
        |do_softirq                     .                               .
        |rcu_pending? yes (ret=2): no pending entries but new entries   .
        |raise_softirq nr 3             .                               .
        |softirq_handler nr 3           .                               .
        rcu_process_callbacks();        .                               .
        |rcu_process_callbacks, rcp_completed=-300, rdp_batch=0, rdp_curlist: 
null, rdp_nxtlist: yes, rdp_donelist: null
          rdp->batch = rcp->cur + 1;    .                               .
          smp_rmb;                      .                               .
        |rcu_next_batch, rcp_cur=-299, rdp_batch=-298, rcp_next_pending? no
          if (!rcp->next_pending) {     .                               .
           rcp->next_pending = 1;       .                               .
           rcu_start_batch();           .                               .
             if (rcp->next_pending && rcp->completed == rcp->cur) //NO!!!
          }                             .                               .
          rcu_check_quiescent_state();  .                               .
        |rcu_check_quiesc_state, rcp_cur=-299, rdp_quiescbatch=-300, 
rdp_qs_pending: no
          if (rdp->quiescbatch != rcp->cur) {                           .
           rdp->qs_pending = 1;         .                               .
           rdp->quiescbatch = rcp->cur; .                               .
        |rcu_grace_start, rcp_cur=-299  .                               .
          }                             .                               .
        .                               .                               .
        .                               .                               
|vcpu_block dXvY
        .                               .                               
|raise_softirq nr 1
        .                               .                               
|rcu_pending? yes (ret=4): waiting for CPU to quiesce (rdp_qs_pending=0)
        .                               .                               
|raise_softirq nr 3
        .                               .                               
|softirq_handler nr 1
        .                               .                               
|//scheduler stuff
        .                               .                               
|runstate_change dXvY running->blocked
        .                               .                               
|runstate_change d32767v2 runnable->running
        .                               .                               
idle_loop()
        .                               .                                 
pm_idle() //e.g., mwait_idle()
        .                               .                                   
cpufreq_dbs_timer_suspend();
        .                               .                               
|timer_stop t=0xffff830319acebc0, fn=0xffff82d080252da9
        .                               .                               
|timer_rm_entry t=0xffff830319acebc0, cpu=2, status=0x3
        .                               .                                   
sched_tick_suspend();
        .                               .                                     
rcu_idle_timer_start()
        .                               .                                   
process_pending_softirqs();
        .                               .                               
|do_softirq, pending = 0x00000008
        .                               .                               
|rcu_pending? yes (ret=4): waiting for CPU to quiesce (rdp_qs_pending=0)
        .                               .                               
|raise_softirq nr 3
        .                               .                               
|softirq_handler nr 3
        .                               .                                     
rcu_process_callbacks();
        .                               .                               
|rcu_process_callbacks, rcp_completed=-300, rdp_batch=0, rdp_curlist: null, 
rdp_nxtlist: null, rdp_donelist: null
        .                               .                                       
rcu_check_quiescent_state();
        .                               .                               
|rcu_check_quiesc_state, rcp_cur=-299, rdp_quiescbatch=-300, rdp_qs_pending: no
        .                               .                                       
  if (rdp->quiescbatch != rcp->cur) {
        .                               .                                       
   rdp->qs_pending = 1;
        .                               .                                       
   rdp->quiescbatch = rcp->cur;
        .                               .                               
|rcu_grace_start, rcp_cur=-299
        .                               .                                       
  }
        .                               .                               
|do_softirq, pending = 0x00000000
        .                               .                               
|rcu_pending? yes (ret=5): waiting for CPU to quiesce (rdp_qs_pending=1)
        .                               .                               
|raise_softirq nr 3
        .                               .                               
|softirq_handler nr 3
        .                               .                                     
rcu_process_callbacks();
        .                               .                               
|rcu_process_callbacks, rcp_completed=-300, rdp_batch=0, rdp_curlist: null, 
rdp_nxtlist: null, rdp_donelist: null
        .                               .                                       
rcu_check_quiescent_state();
        .                               .                               
|rcu_check_quiesc_state, rcp_cur=-299, rdp_quiescbatch=-299, rdp_qs_pending: yes
        .                               .                                       
  if (rdp->quiescbatch != rcp->cur) // NO!!!
        .                               .                                       
  if (!rdp->qs_pending) // NO!!!
        .                               .                                       
  rdp->qs_pending = 0;
        .                               .                               
|rcu_grace_done, rcp_cur=-299, rcp_completed=-300, rdp_quiescbatch=-299
        .                               .                                       
  if (likely(rdp->quiescbatch == rcp->cur)) {
        .                               .                                       
   cpu_quiet();
        .                               .                                       
     cpumask_clear_cpu(cpu, &rcp->cpumask);
        .                               .                               
|rcu_cpu_quiet, rcp_cur=-299, rcp_cpumask=0x00000002 // Active CPU: 1
        .                               .                                       
     if (cpumask_empty(&rcp->cpumask)) // NO!!!
        .                               |do_softirq, pending = 0x00000010
        .                               |rcu_pending? yes (ret=5): waiting for 
CPU to quiesce (rdp_qs_pending=1)
        .                               |raise_softirq nr 3             .
        .                               |softirq_handler nr 3           .
        .                               rcu_process_callbacks();        .
        .                               |rcu_process_callbacks, 
rcp_completed=-300, rdp_batch=-299, rdp_curlist: yes, rdp_nxtlist: null, 
rdp_donelist: null
        .                                 rcu_check_quiescent_state();  .
        .                               |rcu_check_quiesc_state, rcp_cur=-299, 
rdp_quiescbatch=-299, rdp_qs_pending: yes
        .                                   if (rdp->quiescbatch != rcp->cur) 
// NO!!!
        .                                   if (!rdp->qs_pending) // NO!!!
        .                               |rcu_grace_done, rcp_cur=-299, 
rcp_completed=-300, rdp_quiescbatch=-299
        .                                 if (likely(rdp->quiescbatch == 
rcp->cur)) {
        .                                  cpu_quiet(rdp->cpu, rcp);    .
        .                                    cpumask_clear_cpu(cpu, 
&rcp->cpumask); 
        .                               |rcu_cpu_quiet, rcp_cur=-299, 
rcp_cpumask=0x00000000 //No more active CPU
        .                                    if (cpumask_empty(&rcp->cpumask)) {
        .                                     rcp->completed = rcp->cur;.
        .                                     rcu_start_batch();        .
        .                                       if (rcp->next_pending && 
rcp->completed == rcp->cur) {
        .                                        rcp->next_pending = 0; .
        .                                        smp_wmb;               .
        .                               .                               
|do_softirq, pending = 0x00000000
        .                               .                               
|rcu_pending? no
        .                               .                                   
local_irq_disable();
        .                               .                                   if 
(!cpu_is_haltable(cpu)) //NO!!!
        .                               .                               
|rcu_pending? no
        .                               .                               
|pm_idle_start c2, pm_tick=3720475837, exp=1708us, pred=1024us
        .                               .                                   if 
(cpu_is_haltable(cpu)) {  // <--- rcu_pending: quiescbatch=-299, rcp->cur=-299
[1]     .                               .                               
|rcu_pending? no
[2]     .                                        rcp->cur++;            .
        .                                        smp_mb;                .
        .                                        cpumask_andnot(&rcp->cpumask, 
&cpu_online_map, &rcp->idle_cpumask);
        .                               |rcu_start_batch, rcp_cur=-298, 
rcp_cpumask=0x00000007
        .                                       }
        .                                    }
[3]     .                               .                                    
rcu_idle_enter(cpu);
        .                               .                                    
mwait_idle_with_hints();
        .                               .                               .

So, this is basically what's described above, with my CPU1 being Tim's
CPU B (the one that ends the grace period and starts a new one), and my
CPU2 being Tim's CPU A (the one that goes idle). There's the need for
CPU0 issuing a call_rcu() and queueing a new batch (or CPU1 won't start
any new one).

Basically, for the race to occur (in [3]), it is necessary that [2]
(CPU1 doing rcp->cur++) happens after [1] (last call to rcu_pending()
of CPU2, before clearing the mask and going idle). In fact, it that is
not the case, rcu_pending() in CPU2 will say 'no', because of this
check:

 if (rdp->quiescbatch != rcp->cur || rdp->qs_pending)

which would prevent the CPU from going idle (and will, at least, lead
to it going through check_quiescent_state() twice, clearing itself from
the new grace period too).

Therefore, it looks to me that the race can be avoided by making sure
that there is at least one check of rcu_pending(), after a CPU has
called rcu_enter_idle(). This basically means moving rcu_idle_enter()
up.

I was actually thinking to move it inside the tick-stopping logic too.
This way, either, when CPU1 samples the cpumask for the new grace
period:
1) CPU2 will be in idle_cpumask already, and it actually goes idle;
2) CPU2 will be in idle_cpumask, but then it does not go idle
(cpu_is_haltable() returning false) for various reasons. This may look
unideal, but if it is here, trying to go idle, it is not holding
references to read-side critical sections, or at least we can consider
it to be quiescing, so it's ok to ignore it for the grace period;
3) CPU2 is not in idle_cpumask, and so it will not be in the sampled
mask, but the first check of rcu_pending() will lead acknowledge
quiescence, and calling of cpu_quiet();

Which looks fine to me. Or am I still missing something?

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.