> Per-cpu read-write locks allow for the fast path read case to have
> low overhead by only setting/clearing a per-cpu variable for using
> the read lock. The per-cpu read fast path also avoids locked
> compare swap operations which can be particularly slow on coherent
> multi-socket systems, particularly if there is heavy usage of the
> read lock itself.
> The per-cpu reader-writer lock uses a local variable to control
> the read lock fast path. This allows a writer to disable the fast
> path and ensures the readers switch to using the underlying
> read-write lock implementation instead of the per-cpu variable.
> Once the writer has taken the write lock and disabled the fast path,
> it must poll the per-cpu variable for all CPU's which have entered
> the critical section for the specific read-write lock the writer is
> attempting to take. This design allows for a single per-cpu variable
> to be used for read/write locks belonging to seperate data structures.
> If a two or more different per-cpu read lock(s) are taken
> simultaneously then the per-cpu data structure is not used and the
> implementation takes the read lock of the underlying read-write lock,
> this behaviour is equivalent to the slow path in terms of performance.
> The per-cpu rwlock is not recursion safe for taking the per-cpu
> read lock because there is no recursion count variable, this is
> functionally equivalent to standard spin locks.
> Slow path readers which are unblocked, set the per-cpu variable and
> drop the read lock. This simplifies the implementation and allows
> for fairness in the underlying read-write lock to be taken
> advantage of.
> There is more overhead on the per-cpu write lock path due to checking
> each CPUs fast path per-cpu variable but this overhead is likely be
> hidden by the required delay of waiting for readers to exit the
> critical section. The loop is optimised to only iterate over
> the per-cpu data of active readers of the rwlock. The cpumask_t for
> tracking the active readers is stored in a single per-cpu data
> location and thus the write lock is not pre-emption safe. Therefore
> the per-cpu write lock can only be used with interrupts disabled.
