[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v2] arm: reduce power use by contented spin locks with WFE/SEV
From: David Vrabel <david.vrabel@xxxxxxxxxx> Instead of cpu_relax() while spinning and observing the ticket head, introduce arch_lock_relax() which executes a WFE instruction. After the ticket head is changed call arch_lock_signal() to execute an SEV instruction (with the required DSB first) to wake any spinners. This should improve power consumption when locks are contented and spinning. For consistency also move arch_lock_(acquire|release)_barrier to asm/spinlock.h. Booted the result on arm32 (Midway) and arm64 (Mustang). Build test only on amd64. Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> [ijc: add barrier, rename as arch_lock_*, move arch_lock_*_barrier, test] Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> Cc: Jan Beulich <jbeulich@xxxxxxxx> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> --- v2 (ijc): Add dsb(ishst) to spin_relax. s/spin_(relax|signal)/arch_lock_\1/ Move arch_lock_(acquire|release)_barrier to asm/spinlock.h (dropped Andy's Reviewed-by due to this change) In principal the SEV could be made unnecessary on arm64, but this requires a new hook before the wait loop as well as changing observe_head and _spin_unlock to use the Acquire/Release instructions instead of the non-atomic loads and stores used today, which is a lot more refactoring of the generic code than I think we can be bothered with at this stage. 4.6: I'm in two minds about this, the lack of WFE in the ticket spinlocks is not a regression (the old locks lacked them as well, oops!). On the otherhand spinning like this isn't good. I think overall I'm inclined to say this should wait for 4.7 but be a candidate for backport to 4.6.1. --- xen/common/spinlock.c | 5 +++-- xen/include/asm-arm/spinlock.h | 9 ++++++++- xen/include/asm-arm/system.h | 3 --- xen/include/asm-x86/spinlock.h | 14 ++++++++++++++ xen/include/asm-x86/system.h | 11 ----------- 5 files changed, 25 insertions(+), 17 deletions(-) diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c index 29149d1..7f89694 100644 --- a/xen/common/spinlock.c +++ b/xen/common/spinlock.c @@ -141,7 +141,7 @@ void _spin_lock(spinlock_t *lock) while ( tickets.tail != observe_head(&lock->tickets) ) { LOCK_PROFILE_BLOCK; - cpu_relax(); + arch_lock_relax(); } LOCK_PROFILE_GOT; preempt_disable(); @@ -170,6 +170,7 @@ void _spin_unlock(spinlock_t *lock) preempt_enable(); LOCK_PROFILE_REL; add_sized(&lock->tickets.head, 1); + arch_lock_signal(); } void _spin_unlock_irq(spinlock_t *lock) @@ -228,7 +229,7 @@ void _spin_barrier(spinlock_t *lock) if ( sample.head != sample.tail ) { while ( observe_head(&lock->tickets) == sample.head ) - cpu_relax(); + arch_lock_relax(); #ifdef LOCK_PROFILE if ( lock->profile ) { diff --git a/xen/include/asm-arm/spinlock.h b/xen/include/asm-arm/spinlock.h index 81955d1..8cdf9e1 100644 --- a/xen/include/asm-arm/spinlock.h +++ b/xen/include/asm-arm/spinlock.h @@ -1,6 +1,13 @@ #ifndef __ASM_SPINLOCK_H #define __ASM_SPINLOCK_H -/* Nothing ARM specific. */ +#define arch_lock_acquire_barrier() smp_mb() +#define arch_lock_release_barrier() smp_mb() + +#define arch_lock_relax() wfe() +#define arch_lock_signal() do { \ + dsb(ishst); \ + sev(); \ +} while(0) #endif /* __ASM_SPINLOCK_H */ diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h index f0e222f..2eb96e8 100644 --- a/xen/include/asm-arm/system.h +++ b/xen/include/asm-arm/system.h @@ -53,9 +53,6 @@ #define arch_fetch_and_add(x, v) __sync_fetch_and_add(x, v) -#define arch_lock_acquire_barrier() smp_mb() -#define arch_lock_release_barrier() smp_mb() - extern struct vcpu *__context_switch(struct vcpu *prev, struct vcpu *next); #endif diff --git a/xen/include/asm-x86/spinlock.h b/xen/include/asm-x86/spinlock.h index 7d69e75..70a85af 100644 --- a/xen/include/asm-x86/spinlock.h +++ b/xen/include/asm-x86/spinlock.h @@ -4,4 +4,18 @@ #define _raw_read_unlock(l) \ asm volatile ( "lock; dec%z0 %0" : "+m" ((l)->lock) :: "memory" ) +/* + * On x86 the only reordering is of reads with older writes. In the + * lock case, the read in observe_head() can only be reordered with + * writes that precede it, and moving a write _into_ a locked section + * is OK. In the release case, the write in add_sized() can only be + * reordered with reads that follow it, and hoisting a read _into_ a + * locked region is OK. + */ +#define arch_lock_acquire_barrier() barrier() +#define arch_lock_release_barrier() barrier() + +#define arch_lock_relax() cpu_relax() +#define arch_lock_signal() + #endif /* __ASM_SPINLOCK_H */ diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h index 25a6a2a..9fb70f5 100644 --- a/xen/include/asm-x86/system.h +++ b/xen/include/asm-x86/system.h @@ -185,17 +185,6 @@ static always_inline unsigned long __xadd( #define set_mb(var, value) do { xchg(&var, value); } while (0) #define set_wmb(var, value) do { var = value; wmb(); } while (0) -/* - * On x86 the only reordering is of reads with older writes. In the - * lock case, the read in observe_head() can only be reordered with - * writes that precede it, and moving a write _into_ a locked section - * is OK. In the release case, the write in add_sized() can only be - * reordered with reads that follow it, and hoisting a read _into_ a - * locked region is OK. - */ -#define arch_lock_acquire_barrier() barrier() -#define arch_lock_release_barrier() barrier() - #define local_irq_disable() asm volatile ( "cli" : : : "memory" ) #define local_irq_enable() asm volatile ( "sti" : : : "memory" ) -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |