[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/6] xen/rcu: don't use stop_machine_run() for rcu_barrier()

To: Jan Beulich <jbeulich@xxxxxxxx>
From: Jürgen Groß <jgross@xxxxxxxx>
Date: Wed, 11 Mar 2020 07:01:09 +0100
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 11 Mar 2020 06:01:14 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10.03.20 17:37, Jan Beulich wrote:

On 10.03.2020 17:34, Jürgen Groß wrote:

On 10.03.20 17:29, Jan Beulich wrote:

On 10.03.2020 08:28, Juergen Gross wrote:

+void rcu_barrier(void)
   {
-    atomic_t cpu_count = ATOMIC_INIT(0);
-    return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS);
+    unsigned int n_cpus;
+
+    while ( !get_cpu_maps() )
+    {
+        process_pending_softirqs();
+        if ( !atomic_read(&cpu_count) )
+            return;
+
+        cpu_relax();
+    }
+
+    n_cpus = num_online_cpus();
+
+    if ( atomic_cmpxchg(&cpu_count, 0, n_cpus) == 0 )
+    {
+        atomic_add(n_cpus, &done_count);
+        cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
+    }
+
+    while ( atomic_read(&done_count) )


Don't you leave a window for races here, in that done_count
gets set to non-zero only after setting cpu_count? A CPU
losing the cmpxchg attempt above may observe done_count
still being zero, and hence exit without waiting for the
count to actually _drop_ to zero.


This can only be a cpu not having joined the barrier handling, so it
will do that later.


I'm afraid I don't understand - if two CPUs independently call
rcu_barrier(), neither should fall through here without waiting
at all, I would think?


Oh, good catch!

I have thought more about this problem and I think using counters only
for doing rendezvous accounting is rather risky. I'll have a try using
a cpumask instead.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

References:
- [Xen-devel] [PATCH v4 0/6] xen/rcu: let rcu work better with core scheduling
  - From: Juergen Gross
- [Xen-devel] [PATCH v4 2/6] xen/rcu: don't use stop_machine_run() for rcu_barrier()
  - From: Juergen Gross
- Re: [Xen-devel] [PATCH v4 2/6] xen/rcu: don't use stop_machine_run() for rcu_barrier()
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH v4 2/6] xen/rcu: don't use stop_machine_run() for rcu_barrier()
  - From: Jürgen Groß
- Re: [Xen-devel] [PATCH v4 2/6] xen/rcu: don't use stop_machine_run() for rcu_barrier()
  - From: Jan Beulich

Prev by Date: [Xen-devel] [linux-linus test] 148333: regressions - trouble: blocked/broken/fail/pass
Next by Date: Re: [Xen-devel] [PATCH v4 3/6] xen: add process_pending_softirqs_norcu() for keyhandlers
Previous by thread: Re: [Xen-devel] [PATCH v4 2/6] xen/rcu: don't use stop_machine_run() for rcu_barrier()
Next by thread: [Xen-devel] [PATCH v4 3/6] xen: add process_pending_softirqs_norcu() for keyhandlers
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.