[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] arm: reduce power use by contented spin locks with WFE/SEV



On Tue, 15 Sep 2015, Ian Campbell wrote:
> On Mon, 2015-08-03 at 12:29 +0100, Ian Campbell wrote:
> > From: David Vrabel <david.vrabel@xxxxxxxxxx>
> > 
> > Instead of cpu_relax() while spinning and observing the ticket head,
> > introduce arch_lock_relax() which executes a WFE instruction.  After
> > the ticket head is changed call arch_lock_signal() to execute an SEV
> > instruction (with the required DSB first) to wake any spinners.
> > 
> > This should improve power consumption when locks are contented and
> > spinning.
> > 
> > For consistency also move arch_lock_(acquire|release)_barrier to
> > asm/spinlock.h.
> > 
> > Booted the result on arm32 (Midway) and arm64 (Mustang). Build test
> > only on amd64.
> > 
> > Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
> > [ijc: add barrier, rename as arch_lock_*, move arch_lock_*_barrier, test]
> > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> > Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> > Cc: Jan Beulich <jbeulich@xxxxxxxx>
> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> 
> Andy has Reviewed-by but this change still lacks an Ack from the ARM side.
> Stefano, was your previous "The code should work though." intended as an
> Ack?

Yes

> > ---
> > v2 (ijc):
> >   Add dsb(ishst) to spin_relax.
> >   s/spin_(relax|signal)/arch_lock_\1/
> >   Move arch_lock_(acquire|release)_barrier to asm/spinlock.h
> >       (dropped Andy's Reviewed-by due to this change)
> > 
> > In principal the SEV could be made unnecessary on arm64, but this
> > requires a new hook before the wait loop as well as changing
> > observe_head and _spin_unlock to use the Acquire/Release instructions
> > instead of the non-atomic loads and stores used today, which is a lot
> > more refactoring of the generic code than I think we can be bothered
> > with at this stage.
> > 
> > 4.6: I'm in two minds about this, the lack of WFE in the ticket
> > spinlocks is not a regression (the old locks lacked them as well,
> > oops!). On the otherhand spinning like this isn't good. I think
> > overall I'm inclined to say this should wait for 4.7 but be a
> > candidate for backport to 4.6.1.
> > ---
> >  xen/common/spinlock.c          |  5 +++--
> >  xen/include/asm-arm/spinlock.h |  9 ++++++++-
> >  xen/include/asm-arm/system.h   |  3 ---
> >  xen/include/asm-x86/spinlock.h | 14 ++++++++++++++
> >  xen/include/asm-x86/system.h   | 11 -----------
> >  5 files changed, 25 insertions(+), 17 deletions(-)
> > 
> > diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
> > index 29149d1..7f89694 100644
> > --- a/xen/common/spinlock.c
> > +++ b/xen/common/spinlock.c
> > @@ -141,7 +141,7 @@ void _spin_lock(spinlock_t *lock)
> >      while ( tickets.tail != observe_head(&lock->tickets) )
> >      {
> >          LOCK_PROFILE_BLOCK;
> > -        cpu_relax();
> > +        arch_lock_relax();
> >      }
> >      LOCK_PROFILE_GOT;
> >      preempt_disable();
> > @@ -170,6 +170,7 @@ void _spin_unlock(spinlock_t *lock)
> >      preempt_enable();
> >      LOCK_PROFILE_REL;
> >      add_sized(&lock->tickets.head, 1);
> > +    arch_lock_signal();
> >  }
> >  
> >  void _spin_unlock_irq(spinlock_t *lock)
> > @@ -228,7 +229,7 @@ void _spin_barrier(spinlock_t *lock)
> >      if ( sample.head != sample.tail )
> >      {
> >          while ( observe_head(&lock->tickets) == sample.head )
> > -            cpu_relax();
> > +            arch_lock_relax();
> >  #ifdef LOCK_PROFILE
> >          if ( lock->profile )
> >          {
> > diff --git a/xen/include/asm-arm/spinlock.h b/xen/include/asm
> > -arm/spinlock.h
> > index 81955d1..8cdf9e1 100644
> > --- a/xen/include/asm-arm/spinlock.h
> > +++ b/xen/include/asm-arm/spinlock.h
> > @@ -1,6 +1,13 @@
> >  #ifndef __ASM_SPINLOCK_H
> >  #define __ASM_SPINLOCK_H
> >  
> > -/* Nothing ARM specific. */
> > +#define arch_lock_acquire_barrier() smp_mb()
> > +#define arch_lock_release_barrier() smp_mb()
> > +
> > +#define arch_lock_relax() wfe()
> > +#define arch_lock_signal() do { \
> > +    dsb(ishst);                 \
> > +    sev();                      \
> > +} while(0)
> >  
> >  #endif /* __ASM_SPINLOCK_H */
> > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h
> > index f0e222f..2eb96e8 100644
> > --- a/xen/include/asm-arm/system.h
> > +++ b/xen/include/asm-arm/system.h
> > @@ -53,9 +53,6 @@
> >  
> >  #define arch_fetch_and_add(x, v) __sync_fetch_and_add(x, v)
> >  
> > -#define arch_lock_acquire_barrier() smp_mb()
> > -#define arch_lock_release_barrier() smp_mb()
> > -
> >  extern struct vcpu *__context_switch(struct vcpu *prev, struct vcpu
> > *next);
> >  
> >  #endif
> > diff --git a/xen/include/asm-x86/spinlock.h b/xen/include/asm
> > -x86/spinlock.h
> > index 7d69e75..70a85af 100644
> > --- a/xen/include/asm-x86/spinlock.h
> > +++ b/xen/include/asm-x86/spinlock.h
> > @@ -4,4 +4,18 @@
> >  #define _raw_read_unlock(l) \
> >      asm volatile ( "lock; dec%z0 %0" : "+m" ((l)->lock) :: "memory" )
> >  
> > +/*
> > + * On x86 the only reordering is of reads with older writes.  In the
> > + * lock case, the read in observe_head() can only be reordered with
> > + * writes that precede it, and moving a write _into_ a locked section
> > + * is OK.  In the release case, the write in add_sized() can only be
> > + * reordered with reads that follow it, and hoisting a read _into_ a
> > + * locked region is OK.
> > + */
> > +#define arch_lock_acquire_barrier() barrier()
> > +#define arch_lock_release_barrier() barrier()
> > +
> > +#define arch_lock_relax() cpu_relax()
> > +#define arch_lock_signal()
> > +
> >  #endif /* __ASM_SPINLOCK_H */
> > diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
> > index 25a6a2a..9fb70f5 100644
> > --- a/xen/include/asm-x86/system.h
> > +++ b/xen/include/asm-x86/system.h
> > @@ -185,17 +185,6 @@ static always_inline unsigned long __xadd(
> >  #define set_mb(var, value) do { xchg(&var, value); } while (0)
> >  #define set_wmb(var, value) do { var = value; wmb(); } while (0)
> >  
> > -/*
> > - * On x86 the only reordering is of reads with older writes.  In the
> > - * lock case, the read in observe_head() can only be reordered with
> > - * writes that precede it, and moving a write _into_ a locked section
> > - * is OK.  In the release case, the write in add_sized() can only be
> > - * reordered with reads that follow it, and hoisting a read _into_ a
> > - * locked region is OK.
> > - */
> > -#define arch_lock_acquire_barrier() barrier()
> > -#define arch_lock_release_barrier() barrier()
> > -
> >  #define local_irq_disable()     asm volatile ( "cli" : : : "memory" )
> >  #define local_irq_enable()      asm volatile ( "sti" : : : "memory" )
> >  
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.