[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 44/47] mm: shrinker: make global slab shrink lockless



On Wed, Jul 26, 2023 at 05:14:09PM +0800, Qi Zheng wrote:
> On 2023/7/26 16:08, Dave Chinner wrote:
> > On Mon, Jul 24, 2023 at 05:43:51PM +0800, Qi Zheng wrote:
> > > @@ -122,6 +126,13 @@ void shrinker_free_non_registered(struct shrinker 
> > > *shrinker);
> > >   void shrinker_register(struct shrinker *shrinker);
> > >   void shrinker_unregister(struct shrinker *shrinker);
> > > +static inline bool shrinker_try_get(struct shrinker *shrinker)
> > > +{
> > > + return READ_ONCE(shrinker->registered) &&
> > > +        refcount_inc_not_zero(&shrinker->refcount);
> > > +}
> > 
> > Why do we care about shrinker->registered here? If we don't set
> > the refcount to 1 until we have fully initialised everything, then
> > the shrinker code can key entirely off the reference count and
> > none of the lookup code needs to care about whether the shrinker is
> > registered or not.
> 
> The purpose of checking shrinker->registered here is to stop running
> shrinker after calling shrinker_free(), which can prevent the following
> situations from happening:
> 
> CPU 0                 CPU 1
> 
> shrinker_try_get()
> 
>                        shrinker_try_get()
> 
> shrinker_put()
> shrinker_try_get()
>                        shrinker_put()

I don't see any race here? What is wrong with having multiple active
users at once?

> > 
> > This should use a completion, then it is always safe under
> > rcu_read_lock().  This also gets rid of the shrinker_lock spin lock,
> > which only exists because we can't take a blocking lock under
> > rcu_read_lock(). i.e:
> > 
> > 
> > void shrinker_put(struct shrinker *shrinker)
> > {
> >     if (refcount_dec_and_test(&shrinker->refcount))
> >             complete(&shrinker->done);
> > }
> > 
> > void shrinker_free()
> > {
> >     .....
> >     refcount_dec(&shrinker->refcount);
> 
> I guess what you mean is shrinker_put(), because here may be the last
> refcount.

Yes, I did.

> >     wait_for_completion(&shrinker->done);
> >     /*
> >      * lookups on the shrinker will now all fail as refcount has
> >      * fallen to zero. We can now remove it from the lists and
> >      * free it.
> >      */
> >     down_write(shrinker_rwsem);
> >     list_del_rcu(&shrinker->list);
> >     up_write(&shrinker_rwsem);
> >     call_rcu(shrinker->rcu_head, shrinker_free_rcu_cb);
> > }
> > 
> > ....
> > 
> > > @@ -686,11 +711,14 @@ EXPORT_SYMBOL(shrinker_free_non_registered);
> > >   void shrinker_register(struct shrinker *shrinker)
> > >   {
> > > - down_write(&shrinker_rwsem);
> > > - list_add_tail(&shrinker->list, &shrinker_list);
> > > - shrinker->flags |= SHRINKER_REGISTERED;
> > > + refcount_set(&shrinker->refcount, 1);
> > > +
> > > + spin_lock(&shrinker_lock);
> > > + list_add_tail_rcu(&shrinker->list, &shrinker_list);
> > > + spin_unlock(&shrinker_lock);
> > > +
> > >           shrinker_debugfs_add(shrinker);
> > > - up_write(&shrinker_rwsem);
> > > + WRITE_ONCE(shrinker->registered, true);
> > >   }
> > >   EXPORT_SYMBOL(shrinker_register);
> > 
> > This just looks wrong - you are trying to use WRITE_ONCE() as a
> > release barrier to indicate that the shrinker is now set up fully.
> > That's not necessary - the refcount is an atomic and along with the
> > rcu locks they should provides all the barriers we need. i.e.
> 
> The reason I used WRITE_ONCE() here is because the shrinker->registered
> will be read and written concurrently (read in shrinker_try_get() and
> written in shrinker_free()), which is why I added shrinker::registered
> field instead of using SHRINKER_REGISTERED flag (this can reduce the
> addition of WRITE_ONCE()/READ_ONCE()).

Using WRITE_ONCE/READ_ONCE doesn't provide memory barriers needed to
use the field like this. You need release/acquire memory ordering
here. i.e. smp_store_release()/smp_load_acquire().

As it is, the refcount_inc_not_zero() provides a control dependency,
as documented in include/linux/refcount.h, refcount_dec_and_test()
provides release memory ordering. The only thing I think we may need
is a write barrier before refcount_set(), such that if
refcount_inc_not_zero() sees a non-zero value, it is guaranteed to
see an initialised structure...

i.e. refcounts provide all the existence and initialisation
guarantees. Hence I don't see the need to use shrinker->registered
like this and it can remain a bit flag protected by the
shrinker_rwsem().


> > void shrinker_register(struct shrinker *shrinker)
> > {
> >     down_write(&shrinker_rwsem);
> >     list_add_tail_rcu(&shrinker->list, &shrinker_list);
> >     shrinker->flags |= SHRINKER_REGISTERED;
> >     shrinker_debugfs_add(shrinker);
> >     up_write(&shrinker_rwsem);
> > 
> >     /*
> >      * now the shrinker is fully set up, take the first
> >      * reference to it to indicate that lookup operations are
> >      * now allowed to use it via shrinker_try_get().
> >      */
> >     refcount_set(&shrinker->refcount, 1);
> > }
> > 
> > > diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
> > > index f1becfd45853..c5573066adbf 100644
> > > --- a/mm/shrinker_debug.c
> > > +++ b/mm/shrinker_debug.c
> > > @@ -5,6 +5,7 @@
> > >   #include <linux/seq_file.h>
> > >   #include <linux/shrinker.h>
> > >   #include <linux/memcontrol.h>
> > > +#include <linux/rculist.h>
> > >   /* defined in vmscan.c */
> > >   extern struct rw_semaphore shrinker_rwsem;
> > > @@ -161,17 +162,21 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
> > >   {
> > >           struct dentry *entry;
> > >           char buf[128];
> > > - int id;
> > > -
> > > - lockdep_assert_held(&shrinker_rwsem);
> > > + int id, ret = 0;
> > >           /* debugfs isn't initialized yet, add debugfs entries later. */
> > >           if (!shrinker_debugfs_root)
> > >                   return 0;
> > > + down_write(&shrinker_rwsem);
> > > + if (shrinker->debugfs_entry)
> > > +         goto fail;
> > > +
> > >           id = ida_alloc(&shrinker_debugfs_ida, GFP_KERNEL);
> > > - if (id < 0)
> > > -         return id;
> > > + if (id < 0) {
> > > +         ret = id;
> > > +         goto fail;
> > > + }
> > >           shrinker->debugfs_id = id;
> > >           snprintf(buf, sizeof(buf), "%s-%d", shrinker->name, id);
> > > @@ -180,7 +185,8 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
> > >           entry = debugfs_create_dir(buf, shrinker_debugfs_root);
> > >           if (IS_ERR(entry)) {
> > >                   ida_free(&shrinker_debugfs_ida, id);
> > > -         return PTR_ERR(entry);
> > > +         ret = PTR_ERR(entry);
> > > +         goto fail;
> > >           }
> > >           shrinker->debugfs_entry = entry;
> > > @@ -188,7 +194,10 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
> > >                               &shrinker_debugfs_count_fops);
> > >           debugfs_create_file("scan", 0220, entry, shrinker,
> > >                               &shrinker_debugfs_scan_fops);
> > > - return 0;
> > > +
> > > +fail:
> > > + up_write(&shrinker_rwsem);
> > > + return ret;
> > >   }
> > >   int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, 
> > > ...)
> > > @@ -243,6 +252,11 @@ struct dentry *shrinker_debugfs_detach(struct 
> > > shrinker *shrinker,
> > >           shrinker->name = NULL;
> > >           *debugfs_id = entry ? shrinker->debugfs_id : -1;
> > > + /*
> > > +  * Ensure that shrinker->registered has been set to false before
> > > +  * shrinker->debugfs_entry is set to NULL.
> > > +  */
> > > + smp_wmb();
> > >           shrinker->debugfs_entry = NULL;
> > >           return entry;
> > > @@ -266,14 +280,26 @@ static int __init shrinker_debugfs_init(void)
> > >           shrinker_debugfs_root = dentry;
> > >           /* Create debugfs entries for shrinkers registered at boot */
> > > - down_write(&shrinker_rwsem);
> > > - list_for_each_entry(shrinker, &shrinker_list, list)
> > > + rcu_read_lock();
> > > + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
> > > +         if (!shrinker_try_get(shrinker))
> > > +                 continue;
> > > +         rcu_read_unlock();
> > > +
> > >                   if (!shrinker->debugfs_entry) {
> > > -                 ret = shrinker_debugfs_add(shrinker);
> > > -                 if (ret)
> > > -                         break;
> > > +                 /* Paired with smp_wmb() in shrinker_debugfs_detach() */
> > > +                 smp_rmb();
> > > +                 if (READ_ONCE(shrinker->registered))
> > > +                         ret = shrinker_debugfs_add(shrinker);
> > >                   }
> > > - up_write(&shrinker_rwsem);
> > > +
> > > +         rcu_read_lock();
> > > +         shrinker_put(shrinker);
> > > +
> > > +         if (ret)
> > > +                 break;
> > > + }
> > > + rcu_read_unlock();
> > >           return ret;
> > >   }
> > 
> > And all this churn and complexity can go away because the
> > shrinker_rwsem is still used to protect shrinker_register()
> > entirely....
> 
> My consideration is that during this process, there may be a
> driver probe failure and then shrinker_free() is called (the
> shrinker_debugfs_init() is called in late_initcall stage). In
> this case, we need to use RCU+refcount to ensure that the shrinker
> is not freed.

Yeah, you're trying to work around the lack of a
wait_for_completion() call in shrinker_free().

With that, this doesn't need RCU at all, and the iteration can be
done fully under the shrinker_rwsem() safely and so none of this
code needs to change.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.