[Xen-devel] [PATCH] x86/svm: Reduce vmentry latency

Writing to the stack pointer in the middle of a line of pop operations is
specifically recommended against by the optimisation guide, and is a technique
used by Speculative Load Hardening to combat SpectreRSB.

In practice, it causes all further stack-relative accesses to block until the
write to the stack pointer retires, so the stack engine can get back in sync.

Pop into any dead register to discard %rax's value without clobbering the
stack engine.  Smaller compiled code, and runs faster.

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Wei Liu <wl@xxxxxxx>
CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>

In a small test where I wired ICEBP to tighly re-enter the guest, this dropped
the guests perviced time for ICEBP (as close to one vmexit and entry as I
could realistically manage) by 20 ticks.  Sadly, that also seems to be the
granuarlity of measurement.  The modal measurement (accounting for 80% of
samples) was 1200 ticks, and reduced to 1180 with just this change in place.
 xen/arch/x86/hvm/svm/entry.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index e954d8e021..1d2df08e89 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -76,7 +76,7 @@ __UNLIKELY_END(nsvm_hap)
         pop  %r10
         pop  %r9
         pop  %r8
-        add  $8,%rsp /* Skip %rax: restored by VMRUN. */
+        pop  %rcx /* Skip %rax: restored by VMRUN. */
         pop  %rcx
         pop  %rdx
         pop  %rsi

