Xen project Mailing List

Re: [Xen-devel] [PATCH 1/2] rwlock: add per-cpu reader-writer locks

>>> On 18.11.15 at 14:49, <malcolm.crossley@xxxxxxxxxx> wrote: > On 17/11/15 17:00, Jan Beulich wrote: >>>>> On 03.11.15 at 18:58, <malcolm.crossley@xxxxxxxxxx> wrote: >>> Per-cpu read-write locks allow for the fast path read case to have low >>> overhead >>> by only setting/clearing a per-cpu variable for using the read lock. >>> The per-cpu read fast path also avoids locked compare swap operations which >>> can >>> be particularly slow on coherent multi-socket systems, particularly if >>> there is >>> heavy usage of the read lock itself. >>> >>> The per-cpu reader-writer lock uses a global variable to control the read >>> lock >>> fast path. This allows a writer to disable the fast path and ensure the >>> readers >>> use the underlying read-write lock implementation. >>> >>> Once the writer has taken the write lock and disabled the fast path, it must >>> poll the per-cpu variable for all CPU's which have entered the critical >>> section >>> for the specific read-write lock the writer is attempting to take. This >>> design >>> allows for a single per-cpu variable to be used for read/write locks >>> belonging >>> to seperate data structures as long as multiple per-cpu read locks are not >>> simultaneously held by one particular cpu. This also means per-cpu >>> reader-writer locks are not recursion safe. >> >> This sounds like pretty severe a restriction, and something to easily >> get wrong once more than a handful of users got introduced. > > I can add an ASSERT to detect the recursion for the debug hypervisor but you > are right > that it is a restriction. I believe that the tickets locks are not recursion > safe > either and rely on additional infrastructure to prevent the recursion > occurring on > the lock itself. No ordinary spin lock can be acquired recursively. The restriction here is more severe: No two spin locks protecting the same kind of resource can't be used recursively. >>> + cpumask_t active_readers; >> >> Did you consider ways to avoid such a large variable to be put on >> the stack, and none turned out suitable? > > I wasn't aware the cpumask_t variable at 32 bytes was considered problematic > for the stack. I can't think of any way around the issue without causing > further restrictions (a percpu area will prevent taking two different percpu > rwlocks at the same time). 32 bytes it is only for 256 CPU builds. What about a 4095 CPU config? Wouldn't it be possible (either excluding use of such locks from interrupt context, or by disabling interrupts for the duration of the mask's use) to have a per-CPU mask? >> I admit I can't see alternatives, but the price >> - as said above - seems quite high. Or maybe I'm misunderstanding >> the "as long as multiple per-cpu read locks are not simultaneously >> held by one particular cpu", as having made it here I can't right >> away see where that restriction comes from? > > I've tried to detail the overhead earlier in my reply. The price is > mainly paid when writers are using an uncontended lock. The grant > table write lock is almost never taken in the typical use case and > so for particular algorithms/code flows the price may be worth paying. With "price" I was really referring to the usage restriction, not any performance effect. >>> +static inline void percpu_read_unlock(rwlock_t **per_cpudata) >>> +{ >>> + this_cpu_ptr(per_cpudata) = NULL; >>> + smp_wmb(); >>> +} >>> + >>> +/* Don't inline percpu write lock as it's a complex function */ >>> +void percpu_write_lock(rwlock_t **per_cpudata, bool_t *writer_activating, >>> + rwlock_t *rwlock); >>> + >>> +static inline void percpu_write_unlock(bool_t *writer_activating, rwlock_t >>> *rwlock) >>> +{ >>> + *writer_activating = false; >>> + write_unlock(rwlock); >>> +} >> >> Considering that the arguments one needs to pass to any of the >> functions here taking multiple arguments need to be in close sync, >> I'm missing abstracting macros here that need to be passed just >> a single argument (plus a declaration and a definition one). > > I agree the per_cpudata and writer_activating arguments are both linked > "global" state and could be abstracted with macro's. The rwlock_t is a > per resource state and would need to be passed as a seperate argument. > > > So I have some high level questions: > > With the current restrictions on the percpu rwlocks (no recursion and no > accessing two percpu rwlocks resources simultaneously on the same PCPU, > Do you think it is worth creating a generic implementation? Yes, namely with Andrew already hinting at the p2m lock also wanting to be converted to this model. > Or should I just add a grant table specific implementation because the grant > table code conforms to the above restrictions? > > What requirements would there be for a generic percpu rwlock implementation > to be accepted? At least the multiple resource lock issue should be avoided, and I think I've pointed out before how I think this can be achieved without any fundamental changes. > Is infrastructure allowing percpu areas to be used in data structures > required? At some point it seems this would be desirable, but I think I've explained a way to avoid this getting overly complex for now. > Does the "no recursion" restriction need to be removed? Preferably yes, but I think this one can be easier lived with. > Does the "no accessing two percpu rwlock resources simultaneously on the > same PCPU" > restriction need to be removed? See above - yes. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.