[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] x86: Use LOCK ADD instead of MFENCE for smp_mb()


  • To: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 21 Sep 2020 14:04:23 +0100
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxx>
  • Delivery-date: Mon, 21 Sep 2020 13:04:58 +0000
  • Ironport-sdr: OCnyWsl6IA+ZBlRDJDGwkJYYVR7K0r9uZjeSbxLVuvpj6sMuJB5CZsSVGV+rdo6aH8aBrBwEy8 twtkLrFhjVEhtVBTh82Es4DRTTCzcAf6s5a2zCUL5dXsAXNBihtsV+1aL6be7b+DhoPleSVykY Y1KxwzuNAVM3usbdtYrjdpRZf54mT8EDrTMZjgHNcf7YQQB0gusQDK3B5uMVVDPj8fSowY6hBJ vPjQ/pkpfUKR3i9ZUXdENTC2lrXZCRqgXGFmuqzWAetHgoEoZ2SHaRtcTHJLzGkM6CgyCg8V0G jLs=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

MFENCE is overly heavyweight for SMP semantics on WB memory, because it also
orders weaker cached writes, and flushes the WC buffers.

This technique was used as an optimisation in Java[1], and later adopted by
Linux[2] where it was measured to have a 60% performance improvement in VirtIO
benchmarks.

The stack is used because it is hot in the L1 cache, and a -4 offset is used
to avoid creating a false data dependency on live data.  (For 64bit userspace,
the offset needs to be under the red zone to avoid false dependences).

Fix up the 32 bit definitions in HVMLoader and libxc to avoid a false data
dependency.

[1] https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
[2] https://git.kernel.org/torvalds/c/450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
CC: Wei Liu <wl@xxxxxxx>
CC: Ian Jackson <Ian.Jackson@xxxxxxxxxx>
---
 tools/firmware/hvmloader/util.h   | 2 +-
 tools/libs/ctrl/include/xenctrl.h | 4 ++--
 xen/include/asm-x86/system.h      | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 31889de634..4f0baade0e 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -133,7 +133,7 @@ static inline void cpu_relax(void)
 #define barrier() asm volatile ( "" : : : "memory" )
 #define rmb()     barrier()
 #define wmb()     barrier()
-#define mb()      asm volatile ( "lock; addl $0,0(%%esp)" : : : "memory" )
+#define mb()      asm volatile ( "lock addl $0, -4(%%esp)" ::: "memory" )
 
 /*
  * Divide a 64-bit dividend by a 32-bit divisor.
diff --git a/tools/libs/ctrl/include/xenctrl.h 
b/tools/libs/ctrl/include/xenctrl.h
index 73e9535fc8..1d9f514302 100644
--- a/tools/libs/ctrl/include/xenctrl.h
+++ b/tools/libs/ctrl/include/xenctrl.h
@@ -68,11 +68,11 @@
 #define xen_barrier() asm volatile ( "" : : : "memory")
 
 #if defined(__i386__)
-#define xen_mb()  asm volatile ( "lock; addl $0,0(%%esp)" : : : "memory" )
+#define xen_mb()  asm volatile ( "lock addl $0, -4(%%esp)" ::: "memory" )
 #define xen_rmb() xen_barrier()
 #define xen_wmb() xen_barrier()
 #elif defined(__x86_64__)
-#define xen_mb()  asm volatile ( "mfence" : : : "memory")
+#define xen_mb()  asm volatile ( "lock addl $0, -128(%%rsp)" ::: "memory" )
 #define xen_rmb() xen_barrier()
 #define xen_wmb() xen_barrier()
 #elif defined(__arm__)
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 7e5891f3df..6474dd1243 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -226,7 +226,7 @@ static always_inline unsigned long __xadd(
  *
  * Refer to the vendor system programming manuals for further details.
  */
-#define smp_mb()        mb()
+#define smp_mb()        asm volatile ( "lock addl $0, -4(%%rsp)" ::: "memory" )
 #define smp_rmb()       barrier()
 #define smp_wmb()       barrier()
 
-- 
2.11.0




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.