[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v13 09/11] pvqspinlock, x86: Add para-virtualization support

To: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
From: Waiman Long <waiman.long@xxxxxx>
Date: Mon, 03 Nov 2014 16:17:39 -0500
Cc: linux-arch@xxxxxxxxxxxxxxx, Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx>, Oleg Nesterov <oleg@xxxxxxxxxx>, kvm@xxxxxxxxxxxxxxx, Scott J Norton <scott.norton@xxxxxx>, x86@xxxxxxxxxx, Paolo Bonzini <paolo.bonzini@xxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx, Ingo Molnar <mingo@xxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Douglas Hatch <doug.hatch@xxxxxx>
Delivery-date: Mon, 03 Nov 2014 21:18:21 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11/03/2014 05:35 AM, Peter Zijlstra wrote:

On Wed, Oct 29, 2014 at 04:19:09PM -0400, Waiman Long wrote:

  arch/x86/include/asm/pvqspinlock.h    |  411 +++++++++++++++++++++++++++++++++

I do wonder why all this needs to live in x86..

I haven't looked into the para-virtualization code in otherarchitectures to see if my PV code is equally applicable there. That iswhy I put it under the x86 directory. If other architectures decide touse qspinlock with paravirtualization, we may need to pull out somecommon code, if any, back to kernel/locking.


+#ifdef CONFIG_QUEUE_SPINLOCK
+
+static __always_inline void pv_kick_cpu(int cpu)
+{
+       PVOP_VCALLEE1(pv_lock_ops.kick_cpu, cpu);
+}
+
+static __always_inline void pv_lockwait(u8 *lockbyte)
+{
+       PVOP_VCALLEE1(pv_lock_ops.lockwait, lockbyte);
+}
+
+static __always_inline void pv_lockstat(enum pv_lock_stats type)
+{
+       PVOP_VCALLEE1(pv_lock_ops.lockstat, type);
+}

Why are any of these PV ops? they're only called from other pv_*()
functions. What's the point of pv ops you only call from pv code?

It is the same reason that you made them PV ops in your patch. Even whenPV is on, the code won't need to call any of the PV ops most of thetime. So it is just a matter of optimizing the most common case at theexpense of performance in the rare case that the CPU need to be halt andwoken up which will be bad performance-wise anyway However, if you thinkthey should be regular function pointers, I am fine with that too.

+/*
+ *     Queue Spinlock Para-Virtualization (PV) Support
+ *
+ * The PV support code for queue spinlock is roughly the same as that
+ * of the ticket spinlock.

Relative comments are bad, esp. since we'll make the ticket code go away
if this works, at which point this is a reference into a black hole.

Thank for the suggestion, I will remove that when I need to revise thepatch.

                             Each CPU waiting for the lock will spin until it
+ * reaches a threshold. When that happens, it will put itself to a halt state
+ * so that the hypervisor can reuse the CPU cycles in some other guests as
+ * well as returning other hold-up CPUs faster.

+/**
+ * queue_spin_lock - acquire a queue spinlock
+ * @lock: Pointer to queue spinlock structure
+ *
+ * N.B. INLINE_SPIN_LOCK should not be enabled when PARAVIRT_SPINLOCK is on.

One should write a compile time fail for that, not a comment.


Will do that.

+ */
+static __always_inline void queue_spin_lock(struct qspinlock *lock)
+{
+       u32 val;
+
+       val = atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL);
+       if (likely(val == 0))
+               return;
+       if (static_key_false(&paravirt_spinlocks_enabled))
+               pv_queue_spin_lock_slowpath(lock, val);
+       else
+               queue_spin_lock_slowpath(lock, val);
+}

No, this is just vile.. _that_ is what we have PV ops for. And at that
point its the same function it was before the PV stuff, so that whole
inline thing is then gone.

I did that because in all the architectures except s390, the lockfunctions are not inlined. They live in the _raw_spin_lock* defined inkernel/locking/spinlock.c. The unlock functions, on the other hand, areall inlined except when PV spinlock is enabled. So adding a check forthe jump label won't change any of the status quo.

+extern void queue_spin_unlock_slowpath(struct qspinlock *lock);
+
  /**
   * queue_spin_unlock - release a queue spinlock
   * @lock : Pointer to queue spinlock structure
   *
   * An effective smp_store_release() on the least-significant byte.
+ *
+ * Inlining of the unlock function is disabled when CONFIG_PARAVIRT_SPINLOCKS
+ * is defined. So _raw_spin_unlock() will be the only call site that will
+ * have to be patched.

again if you hard rely on the not inlining make a build fail not a
comment.


Will do that.

   */
  static inline void queue_spin_unlock(struct qspinlock *lock)
  {
        barrier();
+       if (!static_key_false(&paravirt_spinlocks_enabled)) {
+               native_spin_unlock(lock);
+               return;
+       }

+       /*
+        * Need to atomically clear the lock byte to avoid racing with
+        * queue head waiter trying to set _QLOCK_LOCKED_SLOWPATH.
+        */
+       if (unlikely(cmpxchg((u8 *)lock, _Q_LOCKED_VAL, 0) != _Q_LOCKED_VAL))
+               queue_spin_unlock_slowpath(lock);
+}

Idem, that static key stuff is wrong, use PV ops to switch between
unlock paths.

As said in my previous emails, the PV ops call site patching codedoesn't work well with locking. First of all, the native_patch()function was called even in a KVM PV guest. Some unlock calls happenedbefore the paravirt_spinlocks_enabled jump label was set up. It occursto me that call site patching is done the first time the call site isused. At least for those early unlock calls, there is no way to figureout if it should use the native fast path or the PV slow path. The onlypossible workaround that I can think of is to use a variable (ifavailable) that signal the end of the bootup init phase, we can thendefer the call site patching until the init phase has passed.

This is a rather complicated solution which may not worth the slightbenefit of a faster unlock call in the native case.

@@ -354,7 +394,7 @@ queue:
         * if there was a previous node; link it and wait until reaching the
         * head of the waitqueue.
         */
-       if (old&  _Q_TAIL_MASK) {
+       if (!pv_link_and_wait_node(old, node)&&  (old&  _Q_TAIL_MASK)) {
                prev = decode_tail(old);
                ACCESS_ONCE(prev->next) = node;
@@ -369,9 +409,11 @@ queue:
         *
         * *,x,y ->  *,0,0
         */
-       while ((val = smp_load_acquire(&lock->val.counter))&
-                       _Q_LOCKED_PENDING_MASK)
+       val = pv_wait_head(lock, node);
+       while (val&  _Q_LOCKED_PENDING_MASK) {
                cpu_relax();
+               val = smp_load_acquire(&lock->val.counter);
+       }

        /*
         * claim the lock:

Please make the pv_*() calls return void and reduce to NOPs. This keeps
the logic invariant of the pv stuff.

In my patch, the two pv_*() calls above serve as replacements of thewaiting code. Making them return void and reduce to NOPs will cause whatKonrad said doing the same operation twice which is not ideal from aperformance point of view for the PV version. Is putting the pre-PV codein the comment help to clarify what the code should be before the PV stuff?


-Longman


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- Re: [Xen-devel] [PATCH v13 09/11] pvqspinlock, x86: Add para-virtualization support
  - From: Peter Zijlstra

Prev by Date: [Xen-devel] [for-xen-4.5 v9 1/2] dpci: Move from an hvm_irq_dpci (and struct domain) to an hvm_dirq_dpci model.
Next by Date: [Xen-devel] Regression, host crash with 4.5rc1
Previous by thread: Re: [Xen-devel] [PATCH v13 09/11] pvqspinlock, x86: Add para-virtualization support
Next by thread: Re: [Xen-devel] [OSSTEST PATCH] DhcpWatch::leases: Check errors
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.