[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 28/47] xen/sched: add code to sync scheduling of all vcpus of a sched unit



On 20.09.19 18:08, Jan Beulich wrote:
On 14.09.2019 10:52, Juergen Gross wrote:
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -55,6 +55,9 @@ boolean_param("sched_smt_power_savings", 
sched_smt_power_savings);
  int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
  integer_param("sched_ratelimit_us", sched_ratelimit_us);
+/* Number of vcpus per struct sched_unit. */
+static unsigned int __read_mostly sched_granularity = 1;

Didn't you indicate earlier that this would be a per-pool property?
Or was that just a longer term plan?

That was planned for later.


+/*
+ * Rendezvous before taking a scheduling decision.
+ * Called with schedule lock held, so all accesses to the rendezvous counter
+ * can be normal ones (no atomic accesses needed).
+ * The counter is initialized to the number of cpus to rendezvous initially.
+ * Each cpu entering will decrement the counter. In case the counter becomes
+ * zero do_schedule() is called and the rendezvous counter for leaving
+ * context_switch() is set. All other members will wait until the counter is
+ * becoming zero, dropping the schedule lock in between.
+ */

This recurring lock/unlock is liable to cause a massive cache line
ping-pong, especially for socket or node scheduling. Instead of
just a cpu_relax() between the main unlock and re-lock, could there
perhaps be lock-less checks to determine whether there's any point
at all re-acquiring the lock?

Hmm, this is certainly an idea for improvement.

I will think about that and in case I can come up with something I'll
send either a followup patch or include it in the series, depending on
the complexity of the solution.


+static void schedule(void)
+{
+    struct vcpu          *vnext, *vprev = current;
+    struct sched_unit    *prev = vprev->sched_unit, *next = NULL;
+    s_time_t              now;
+    struct sched_resource *sd;
+    spinlock_t           *lock;
+    int cpu = smp_processor_id();
+
+    ASSERT_NOT_IN_ATOMIC();
+
+    SCHED_STAT_CRANK(sched_run);
+
+    sd = get_sched_res(cpu);
+
+    lock = pcpu_schedule_lock_irq(cpu);
+
+    if ( prev->rendezvous_in_cnt )
+    {
+        /*
+         * We have a race: sched_slave() should be called, so raise a softirq
+         * in order to re-enter schedule() later and call sched_slave() now.
+         */
+        pcpu_schedule_unlock_irq(lock, cpu);
+
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return sched_slave();
+    }
+
+    now = NOW();
+
+    stop_timer(&sd->s_timer);

Is the order of these two relevant? A while ago there were a couple
of changes moving such NOW() invocations past anything that may take
non-negligible time, to make accounting as accurate as possible.

No, I don't think the order is relevant. I can swap them.


--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -4,6 +4,7 @@
  /* Low-latency softirqs come first in the following list. */
  enum {
      TIMER_SOFTIRQ = 0,
+    SCHED_SLAVE_SOFTIRQ,
      SCHEDULE_SOFTIRQ,
      NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
      RCU_SOFTIRQ,

Seeing the comment, is the insertion you do as well as the pre-
existing placement of SCHEDULE_SOFTIRQ still appropriate with
the rendezvous-ing you introduce?

Putting SCHED_SLAVE_SOFTIRQ before SCHEDULE_SOFTIRQ is done on purpose,
as I want slave events to have higher priority than normal schedule
events.

Whether both want to be at that place or should be moved is something
which should be considered carefully. Is it okay to postpone that
question?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.