[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] sched/core: For a new metric, add vcpu->nonaffine_time


  • To: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Jürgen Groß <jgross@xxxxxxxx>
  • Date: Mon, 21 Jul 2025 13:22:02 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: Dario Faggioli <dfaggioli@xxxxxxxx>, George Dunlap <gwd@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Mon, 21 Jul 2025 11:22:18 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.07.25 11:49, Bernhard Kaindl wrote:
To monitor the effectiveness of vCPU soft-affinity on NUMA hosts,
we'd like to create a vCPU metric that accumulates the amount of
vCPU time running outside of the soft affinity mask of the sched-unit:

- Add a new time counter, nonaffine_time to struct vcpu.

- Accumulate the nonaffine_time on vcpu_runstate_change():
   Account the time spent in the RUNSTATE_running state outside
   of unit->cpu_soft_affinity: It is always initialized and defaults
   to cpumask_all (bits for all NR_CPUS set), so we only accumulate
   nonaffine time when the vCPU runs on an unset CPU (non-affine).

In the next patch, this field can be used to retrieve the accumulated
nonaffine running time e.g. using vcpu_runstate_get().

Please avoid phrases like "in the next patch" in commit messages.
There is no guarantee a series will be committed in one go.

I'd just drop this last sentence.


Signed-off-by: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxx>
---
  xen/common/sched/core.c | 20 ++++++++++++++++++++
  xen/include/xen/sched.h | 11 +++++++++++
  2 files changed, 31 insertions(+)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 13fdf57e57..489255b9c6 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -260,6 +260,23 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
      }
  }
+/*
+ * For accounting non-affine running time of a vCPU, return true if
+ * the vCPU is running in RUNSTATE_running state while not on a CPU
+ * in unit->cpu_soft_affinity.

"the vCPU is running in RUNSTATE_running state" is a weird statement.
When running it will always be in the RUNSTATE_running state. I'd write
"the vCPU is in RUNSTATE_running state".

+ */
+static inline bool nonaffine(const struct vcpu *v,
+                             const struct sched_unit *unit)
+{
+    /*
+     * unit->cpu_soft_affinity is always initialized and defaults to
+     * cpumask_all (bits for all NR_CPUS set), so we only accumulate
+     * nonaffine time when the vCPU runs on an unset CPU (non-affine).
+     */
+    return v->runstate.state == RUNSTATE_running &&
+           !cpumask_test_cpu(v->processor, unit->cpu_soft_affinity);
+}
+
  static inline void vcpu_runstate_change(
      struct vcpu *v, int new_state, s_time_t new_entry_time)
  {
@@ -285,6 +302,9 @@ static inline void vcpu_runstate_change(
      {
          v->runstate.time[v->runstate.state] += delta;
          v->runstate.state_entry_time = new_entry_time;
+
+        if ( nonaffine(v, unit) ) /* When running nonaffine, add delta */
+            v->nonaffine_time += delta;
      }

Is this really correct? Imagine a vcpu running for very long time on
a physical cpu without losing it (RUNSTATE_running for minutes, hours
or even days). Now someone is changing the soft-affinity of the vcpu
and as a result it will be moved to another physical cpu. You will add
all the long time the vcpu was running to v->nonaffine_time in spite of
the affinity change having happened only nanoseconds before.

      v->runstate.state = new_state;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index fe53d4fab7..aba60afd4f 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -198,7 +198,18 @@ struct vcpu
struct sched_unit *sched_unit; + /*
+     * The struct vcpu_runstate_info contains the vCPU time spent
+     * in each runstate and the entry time of the current runstate.
+     *
+     * Note: This field is used for the guest runstate shared memory area.
+     * Therefore, it is part of the frozen guest API and cannot be changed.
+     */

s/frozen/public/

In the end I'm not really sure this comment is adding much value.
But maybe I'm biased as I've worked with this code a lot.

      struct vcpu_runstate_info runstate;
+
+    /* vCPU time running outside the scheduling unit's soft_affinity mask */
+    uint64_t         nonaffine_time;
+
  #ifndef CONFIG_COMPAT
  # define runstate_guest(v) ((v)->runstate_guest)
      XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.