[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCHv2 1/3] rwlock: Add per-cpu reader-writer locks



On 20/11/15 16:03, Malcolm Crossley wrote:
> Per-cpu read-write locks allow for the fast path read case to have
> low overhead by only setting/clearing a per-cpu variable for using
> the read lock. The per-cpu read fast path also avoids locked
> compare swap operations which can be particularly slow on coherent
> multi-socket systems, particularly if there is heavy usage of the
> read lock itself.
> 
> The per-cpu reader-writer lock uses a local variable to control
> the read lock fast path. This allows a writer to disable the fast
> path and ensures the readers switch to using the underlying
> read-write lock implementation instead of the per-cpu variable.
> 
> Once the writer has taken the write lock and disabled the fast path,
> it must poll the per-cpu variable for all CPU's which have entered
> the critical section for the specific read-write lock the writer is
> attempting to take. This design allows for a single per-cpu variable
> to be used for read/write locks belonging to seperate data structures.
> If a two or more different per-cpu read lock(s) are taken
> simultaneously then the per-cpu data structure is not used and the
> implementation takes the read lock of the underlying read-write lock,
> this behaviour is equivalent to the slow path in terms of performance.
> The per-cpu rwlock is not recursion safe for taking the per-cpu
> read lock because there is no recursion count variable, this is
> functionally equivalent to standard spin locks.
> 
> Slow path readers which are unblocked, set the per-cpu variable and
> drop the read lock. This simplifies the implementation and allows
> for fairness in the underlying read-write lock to be taken
> advantage of.
> 
> There is more overhead on the per-cpu write lock path due to checking
> each CPUs fast path per-cpu variable but this overhead is likely be
> hidden by the required delay of waiting for readers to exit the
> critical section. The loop is optimised to only iterate over
> the per-cpu data of active readers of the rwlock. The cpumask_t for
> tracking the active readers is stored in a single per-cpu data
> location and thus the write lock is not pre-emption safe. Therefore
> the per-cpu write lock can only be used with interrupts disabled.
> 
> Signed-off-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>

Code looks good to me.  Two comments.

First, it was difficult to review this because there wasn't a clear
description of the algorithm, nor even a comment describing how one
would actually use it.  It wasn't until I looked at the subsequent
patches that I got the basic idea.

A brief comment in the header file describing how one might actually go
about using this functionality would be useful.

Secondly, the way this is designed here opens up the possibility for a
new kind of bug.  In order to work correctly, all *callers* have to
match up the lock they're trying to grab with the correct per-cpu lock.
 You could imagine some point in the future, for example, breaking the
grant lock into two types; at which point there would be a risk that one
of the readers would accidentally use the wrong per-cpu lock, allowing a
writer to grab the lock while a reader is still in the critical section.

Would it make sense to have the per-cpu rwlock pointers be an array, and
then store the index of the array in the lock itself (set on
initialization)?  That would make the calls cleaner (only one argument
instead of two), and eliminate the class of error described above.

 -George

> --
> Changes since v1:
> - Removed restriction on taking two or more different percpu rw
> locks simultaneously
> - Moved fast-path/slow-path barrier to be per lock instead of global
> - Created seperate percpu_rwlock_t type and added macros to
> initialise new type
> - Added helper macros for using the percpu rwlock itself
> - Moved active_readers cpumask off the stack and into a percpu
> variable
> ---
>  xen/common/spinlock.c        | 38 +++++++++++++++++++
>  xen/include/asm-arm/percpu.h |  5 +++
>  xen/include/asm-x86/percpu.h |  6 +++
>  xen/include/xen/percpu.h     |  4 ++
>  xen/include/xen/spinlock.h   | 87 
> ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 140 insertions(+)
> 
> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
> index 7f89694..a0f9df5 100644
> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -10,6 +10,8 @@
>  #include <asm/processor.h>
>  #include <asm/atomic.h>
>  
> +DEFINE_PER_CPU(cpumask_t, percpu_rwlock_readers);
> +
>  #ifndef NDEBUG
>  
>  static atomic_t spin_debug __read_mostly = ATOMIC_INIT(0);
> @@ -492,6 +494,42 @@ int _rw_is_write_locked(rwlock_t *lock)
>      return (lock->lock == RW_WRITE_FLAG); /* writer in critical section? */
>  }
>  
> +void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
> +             percpu_rwlock_t *percpu_rwlock)
> +{
> +    unsigned int cpu;
> +
> +    /* First take the write lock to protect against other writers. */
> +    write_lock(&percpu_rwlock->rwlock);
> +
> +    /* Now set the global variable so that readers start using read_lock. */
> +    percpu_rwlock->writer_activating = 1;
> +    smp_mb();
> +
> +    /* Using a per cpu cpumask is only safe if there is no nesting. */
> +    ASSERT(!in_irq());
> +    this_cpu(percpu_rwlock_readers) = cpu_online_map;
> +
> +    /* Check if there are any percpu readers in progress on this rwlock. */
> +    for ( ; ; )
> +    {
> +        for_each_cpu(cpu, &this_cpu(percpu_rwlock_readers))
> +        {
> +            /* 
> +          * Remove any percpu readers not contending on this rwlock
> +             * from our check mask.
> +          */
> +            if ( per_cpu_ptr(per_cpudata, cpu) != percpu_rwlock )
> +                cpumask_clear_cpu(cpu, &this_cpu(percpu_rwlock_readers));
> +        }
> +        /* Check if we've cleared all percpu readers from check mask. */
> +        if ( cpumask_empty(&this_cpu(percpu_rwlock_readers)) )
> +            break;
> +        /* Give the coherency fabric a break. */
> +        cpu_relax();
> +    };
> +}
> +
>  #ifdef LOCK_PROFILE
>  
>  struct lock_profile_anc {
> diff --git a/xen/include/asm-arm/percpu.h b/xen/include/asm-arm/percpu.h
> index 71e7649..c308a56 100644
> --- a/xen/include/asm-arm/percpu.h
> +++ b/xen/include/asm-arm/percpu.h
> @@ -27,6 +27,11 @@ void percpu_init_areas(void);
>  #define __get_cpu_var(var) \
>      (*RELOC_HIDE(&per_cpu__##var, READ_SYSREG(TPIDR_EL2)))
>  
> +#define per_cpu_ptr(var, cpu)  \
> +    (*RELOC_HIDE(&var, __per_cpu_offset[cpu]))
> +#define __get_cpu_ptr(var) \
> +    (*RELOC_HIDE(&var, READ_SYSREG(TPIDR_EL2)))
> +
>  #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
>  
>  DECLARE_PER_CPU(unsigned int, cpu_id);
> diff --git a/xen/include/asm-x86/percpu.h b/xen/include/asm-x86/percpu.h
> index 604ff0d..51562b9 100644
> --- a/xen/include/asm-x86/percpu.h
> +++ b/xen/include/asm-x86/percpu.h
> @@ -20,4 +20,10 @@ void percpu_init_areas(void);
>  
>  #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
>  
> +#define __get_cpu_ptr(var) \
> +    (*RELOC_HIDE(var, get_cpu_info()->per_cpu_offset))
> +
> +#define per_cpu_ptr(var, cpu)  \
> +    (*RELOC_HIDE(var, __per_cpu_offset[cpu]))
> +
>  #endif /* __X86_PERCPU_H__ */
> diff --git a/xen/include/xen/percpu.h b/xen/include/xen/percpu.h
> index abe0b11..c896863 100644
> --- a/xen/include/xen/percpu.h
> +++ b/xen/include/xen/percpu.h
> @@ -16,6 +16,10 @@
>  /* Preferred on Xen. Also see arch-defined per_cpu(). */
>  #define this_cpu(var)    __get_cpu_var(var)
>  
> +#define this_cpu_ptr(ptr)    __get_cpu_ptr(ptr)
> +
> +#define get_per_cpu_var(var)  (per_cpu__##var)
> +
>  /* Linux compatibility. */
>  #define get_cpu_var(var) this_cpu(var)
>  #define put_cpu_var(var)
> diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
> index fb0438e..a5a25b6 100644
> --- a/xen/include/xen/spinlock.h
> +++ b/xen/include/xen/spinlock.h
> @@ -3,6 +3,7 @@
>  
>  #include <asm/system.h>
>  #include <asm/spinlock.h>
> +#include <asm/types.h>
>  
>  #ifndef NDEBUG
>  struct lock_debug {
> @@ -261,4 +262,90 @@ int _rw_is_write_locked(rwlock_t *lock);
>  #define rw_is_locked(l)               _rw_is_locked(l)
>  #define rw_is_write_locked(l)         _rw_is_write_locked(l)
>  
> +struct percpu_rwlock {
> +     rwlock_t rwlock;
> +     bool_t   writer_activating;
> +};
> +
> +typedef struct percpu_rwlock percpu_rwlock_t;
> +
> +#define PERCPU_RW_LOCK_UNLOCKED { RW_LOCK_UNLOCKED, 0 }
> +#define DEFINE_PERCPU_RWLOCK_RESOURCE(l) percpu_rwlock_t l = 
> PERCPU_RW_LOCK_UNLOCKED
> +#define percpu_rwlock_resource_init(l) (*(l) = 
> (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED)
> +
> +static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
> +                                         percpu_rwlock_t *percpu_rwlock)
> +{
> +    /* We cannot support recursion on the same lock. */
> +    ASSERT(this_cpu_ptr(per_cpudata) != percpu_rwlock);
> +    /* 
> +     * Detect using a second percpu_rwlock_t simulatenously and fallback
> +     * to standard read_lock.
> +     */
> +    if ( unlikely(this_cpu_ptr(per_cpudata) != NULL ) )
> +    {
> +        read_lock(&percpu_rwlock->rwlock);
> +        return;
> +    }
> +
> +    /* Indicate this cpu is reading. */
> +    this_cpu_ptr(per_cpudata) = percpu_rwlock;
> +    smp_mb();
> +    /* Check if a writer is waiting. */
> +    if ( unlikely(percpu_rwlock->writer_activating) )
> +    {
> +        /* Let the waiting writer know we aren't holding the lock. */
> +        this_cpu_ptr(per_cpudata) = NULL;
> +        /* Wait using the read lock to keep the lock fair. */
> +        read_lock(&percpu_rwlock->rwlock);
> +        /* Set the per CPU data again and continue. */
> +        this_cpu_ptr(per_cpudata) = percpu_rwlock;
> +        /* Drop the read lock because we don't need it anymore. */
> +        read_unlock(&percpu_rwlock->rwlock);
> +    }
> +}
> +
> +static inline void _percpu_read_unlock(percpu_rwlock_t **per_cpudata,
> +                percpu_rwlock_t *percpu_rwlock)
> +{
> +    ASSERT(this_cpu_ptr(per_cpudata) != NULL);
> +    /* 
> +     * Detect using a second percpu_rwlock_t simulatenously and fallback
> +     * to standard read_unlock.
> +     */
> +    if ( unlikely(this_cpu_ptr(per_cpudata) != percpu_rwlock ) )
> +    {
> +        read_unlock(&percpu_rwlock->rwlock);
> +        return;
> +    }
> +    this_cpu_ptr(per_cpudata) = NULL;
> +    smp_wmb();
> +}
> +
> +/* Don't inline percpu write lock as it's a complex function. */
> +void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
> +             percpu_rwlock_t *percpu_rwlock);
> +
> +static inline void _percpu_write_unlock(percpu_rwlock_t *percpu_rwlock)
> +{
> +    ASSERT(percpu_rwlock->writer_activating);
> +    percpu_rwlock->writer_activating = 0;
> +    write_unlock(&percpu_rwlock->rwlock);
> +}
> +
> +#define percpu_rw_is_write_locked(l)         
> _rw_is_write_locked(&((l)->rwlock))
> +
> +#define percpu_read_lock(percpu, lock) ( _percpu_read_lock( \
> +                                        &get_per_cpu_var(percpu), lock ) )
> +#define percpu_read_unlock(percpu, lock) ( _percpu_read_unlock( \
> +                                        &get_per_cpu_var(percpu), lock ) )
> +#define percpu_write_lock(percpu, lock) ( _percpu_write_lock( \
> +                                        &get_per_cpu_var(percpu), lock ) )
> +#define percpu_write_unlock(percpu, lock) ( _percpu_write_unlock( lock ) )
> +
> +#define DEFINE_PERCPU_RWLOCK_GLOBAL(name) DEFINE_PER_CPU(percpu_rwlock_t *, \
> +                                                         name)
> +#define DECLARE_PERCPU_RWLOCK_GLOBAL(name) DECLARE_PER_CPU(percpu_rwlock_t 
> *, \
> +                                                         name)
> +
>  #endif /* __SPINLOCK_H__ */
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.