Xen project Mailing List

Re: [Xen-devel] [PATCH v2] arm: reduce power use by contented spin locks with WFE/SEV

On Tue, 15 Sep 2015, Ian Campbell wrote: > On Mon, 2015-08-03 at 12:29 +0100, Ian Campbell wrote: > > From: David Vrabel <david.vrabel@xxxxxxxxxx> > > > > Instead of cpu_relax() while spinning and observing the ticket head, > > introduce arch_lock_relax() which executes a WFE instruction. After > > the ticket head is changed call arch_lock_signal() to execute an SEV > > instruction (with the required DSB first) to wake any spinners. > > > > This should improve power consumption when locks are contented and > > spinning. > > > > For consistency also move arch_lock_(acquire|release)_barrier to > > asm/spinlock.h. > > > > Booted the result on arm32 (Midway) and arm64 (Mustang). Build test > > only on amd64. > > > > Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> > > [ijc: add barrier, rename as arch_lock_*, move arch_lock_*_barrier, test] > > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> > > Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> > > Cc: Jan Beulich <jbeulich@xxxxxxxx> > > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Andy has Reviewed-by but this change still lacks an Ack from the ARM side. > Stefano, was your previous "The code should work though." intended as an > Ack? Yes > > --- > > v2 (ijc): > > Add dsb(ishst) to spin_relax. > > s/spin_(relax|signal)/arch_lock_\1/ > > Move arch_lock_(acquire|release)_barrier to asm/spinlock.h > > (dropped Andy's Reviewed-by due to this change) > > > > In principal the SEV could be made unnecessary on arm64, but this > > requires a new hook before the wait loop as well as changing > > observe_head and _spin_unlock to use the Acquire/Release instructions > > instead of the non-atomic loads and stores used today, which is a lot > > more refactoring of the generic code than I think we can be bothered > > with at this stage. > > > > 4.6: I'm in two minds about this, the lack of WFE in the ticket > > spinlocks is not a regression (the old locks lacked them as well, > > oops!). On the otherhand spinning like this isn't good. I think > > overall I'm inclined to say this should wait for 4.7 but be a > > candidate for backport to 4.6.1. > > --- > > xen/common/spinlock.c | 5 +++-- > > xen/include/asm-arm/spinlock.h | 9 ++++++++- > > xen/include/asm-arm/system.h | 3 --- > > xen/include/asm-x86/spinlock.h | 14 ++++++++++++++ > > xen/include/asm-x86/system.h | 11 ----------- > > 5 files changed, 25 insertions(+), 17 deletions(-) > > > > diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c > > index 29149d1..7f89694 100644 > > --- a/xen/common/spinlock.c > > +++ b/xen/common/spinlock.c > > @@ -141,7 +141,7 @@ void _spin_lock(spinlock_t *lock) > > while ( tickets.tail != observe_head(&lock->tickets) ) > > { > > LOCK_PROFILE_BLOCK; > > - cpu_relax(); > > + arch_lock_relax(); > > } > > LOCK_PROFILE_GOT; > > preempt_disable(); > > @@ -170,6 +170,7 @@ void _spin_unlock(spinlock_t *lock) > > preempt_enable(); > > LOCK_PROFILE_REL; > > add_sized(&lock->tickets.head, 1); > > + arch_lock_signal(); > > } > > > > void _spin_unlock_irq(spinlock_t *lock) > > @@ -228,7 +229,7 @@ void _spin_barrier(spinlock_t *lock) > > if ( sample.head != sample.tail ) > > { > > while ( observe_head(&lock->tickets) == sample.head ) > > - cpu_relax(); > > + arch_lock_relax(); > > #ifdef LOCK_PROFILE > > if ( lock->profile ) > > { > > diff --git a/xen/include/asm-arm/spinlock.h b/xen/include/asm > > -arm/spinlock.h > > index 81955d1..8cdf9e1 100644 > > --- a/xen/include/asm-arm/spinlock.h > > +++ b/xen/include/asm-arm/spinlock.h > > @@ -1,6 +1,13 @@ > > #ifndef __ASM_SPINLOCK_H > > #define __ASM_SPINLOCK_H > > > > -/* Nothing ARM specific. */ > > +#define arch_lock_acquire_barrier() smp_mb() > > +#define arch_lock_release_barrier() smp_mb() > > + > > +#define arch_lock_relax() wfe() > > +#define arch_lock_signal() do { \ > > + dsb(ishst); \ > > + sev(); \ > > +} while(0) > > > > #endif /* __ASM_SPINLOCK_H */ > > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > > index f0e222f..2eb96e8 100644 > > --- a/xen/include/asm-arm/system.h > > +++ b/xen/include/asm-arm/system.h > > @@ -53,9 +53,6 @@ > > > > #define arch_fetch_and_add(x, v) __sync_fetch_and_add(x, v) > > > > -#define arch_lock_acquire_barrier() smp_mb() > > -#define arch_lock_release_barrier() smp_mb() > > - > > extern struct vcpu *__context_switch(struct vcpu *prev, struct vcpu > > *next); > > > > #endif > > diff --git a/xen/include/asm-x86/spinlock.h b/xen/include/asm > > -x86/spinlock.h > > index 7d69e75..70a85af 100644 > > --- a/xen/include/asm-x86/spinlock.h > > +++ b/xen/include/asm-x86/spinlock.h > > @@ -4,4 +4,18 @@ > > #define _raw_read_unlock(l) \ > > asm volatile ( "lock; dec%z0 %0" : "+m" ((l)->lock) :: "memory" ) > > > > +/* > > + * On x86 the only reordering is of reads with older writes. In the > > + * lock case, the read in observe_head() can only be reordered with > > + * writes that precede it, and moving a write _into_ a locked section > > + * is OK. In the release case, the write in add_sized() can only be > > + * reordered with reads that follow it, and hoisting a read _into_ a > > + * locked region is OK. > > + */ > > +#define arch_lock_acquire_barrier() barrier() > > +#define arch_lock_release_barrier() barrier() > > + > > +#define arch_lock_relax() cpu_relax() > > +#define arch_lock_signal() > > + > > #endif /* __ASM_SPINLOCK_H */ > > diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h > > index 25a6a2a..9fb70f5 100644 > > --- a/xen/include/asm-x86/system.h > > +++ b/xen/include/asm-x86/system.h > > @@ -185,17 +185,6 @@ static always_inline unsigned long __xadd( > > #define set_mb(var, value) do { xchg(&var, value); } while (0) > > #define set_wmb(var, value) do { var = value; wmb(); } while (0) > > > > -/* > > - * On x86 the only reordering is of reads with older writes. In the > > - * lock case, the read in observe_head() can only be reordered with > > - * writes that precede it, and moving a write _into_ a locked section > > - * is OK. In the release case, the write in add_sized() can only be > > - * reordered with reads that follow it, and hoisting a read _into_ a > > - * locked region is OK. > > - */ > > -#define arch_lock_acquire_barrier() barrier() > > -#define arch_lock_release_barrier() barrier() > > - > > #define local_irq_disable() asm volatile ( "cli" : : : "memory" ) > > #define local_irq_enable() asm volatile ( "sti" : : : "memory" ) > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.