[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 1/9] schedule: Introduce per-pcpu time accounting



Hello Volodymyr,

On 11.09.19 21:01, Volodymyr Babchuk wrote:
Introduce per-pcpu time accounting what includes the following states:

TACC_HYP - the pcpu executes hypervisor code like softirq processing
            (including scheduling), tasklets and context switches
TACC_GUEST - the pcpu executes guests code
TACC_IDLE - the low-power state of the pcpu
Is it really low-power?

It is rather matter of scheduling design. It differs from OS to OS, even from 
arch to arch. See [1].
Me personally tend to treat only low-power state as a true idle.
As a bad (IMO) example I can give the current XEN mainline. Pretty heavy tasks 
could be performed by the idle vcpu, and they are accounted as idle. This may 
mislead, for example, cpufreq governor.

TACC_IRQ - the pcpu performs interrupts processing, without separation to
            guest or hypervisor interrupts
I think, word "distinguishing" would be better than "separation"

Why so?


TACC_GSYNC - the pcpu executes hypervisor code to process synchronous trap
              from the guest. E.g. hypercall processing or io emulation.

Currently, the only reenterant state is TACC_IRQ. It is assumed, no changes
to state other than TACC_IRQ could happen until we return from nested
interrupts. IRQ time is accounted in a distinct way comparing to other states.
It is acumulated between other states transition moments, and is substracted
from the old state on states transion calculation.

Signed-off-by: Andrii Anisov <andrii_anisov@xxxxxxxx>
---
  xen/common/schedule.c   | 81 +++++++++++++++++++++++++++++++++++++++++++++++++
  xen/include/xen/sched.h | 27 +++++++++++++++++
  2 files changed, 108 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7b71581..6dd6603 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1539,6 +1539,87 @@ static void schedule(void)
      context_switch(prev, next);
  }
+DEFINE_PER_CPU(struct tacc, tacc);
+
+static void tacc_state_change(enum TACC_STATES new_state)
+{
+    s_time_t now, delta;
+    struct tacc* tacc = &this_cpu(tacc);
+    unsigned long flags;
+
+    local_irq_save(flags);
+
+    now = NOW();
+    delta = now - tacc->state_entry_time;
+
+    /* We do not expect states reenterability (at least through this 
function)*/
+    ASSERT(new_state != tacc->state);
+
+    tacc->state_time[tacc->state] += delta - tacc->irq_time;
+    tacc->state_time[TACC_IRQ] += tacc->irq_time;
+    tacc->irq_time = 0;
+    tacc->state = new_state;
+    tacc->state_entry_time = now;
+
+    local_irq_restore(flags);
+}
+
+void tacc_hyp(int place)
I believe, you want some enum for this "place" parameter type
+{
+//    printk("\ttacc_hyp %u, place %d\n", smp_processor_id(), place);
Please, don't push commented-out code. BTW, I think, it is possible to
add some TACC_DEBUG facilities to enable/disable this traces during
compile-time.

Also, looks like you don't use "place" parameter at all.

Since that is the RFC, I've comforted myself with leaving my debug code in 
place. I hope it should not be confusing.


Lastly, I believe that this function (and other similar functions below)
can be defined as "static inline" in a header file

Not this time. They are mostly called from asm (at least now).


+    tacc_state_change(TACC_HYP);
+}
+
+void tacc_guest(int place)
+{
+//    printk("\ttacc_guest %u, place %d\n", smp_processor_id(), place);
+    tacc_state_change(TACC_GUEST);
+}
+
+void tacc_idle(int place)
+{
+//    printk("\tidle cpu %u, place %d\n", smp_processor_id(), place);
+    tacc_state_change(TACC_IDLE);
+}
+
+void tacc_gsync(int place)
+{
+//    printk("\ttacc_gsync %u, place %d\n", smp_processor_id(), place);
+    tacc_state_change(TACC_GSYNC);
+}
+
+void tacc_irq_enter(int place)
+{
+    struct tacc* tacc = &this_cpu(tacc);
+
+//    printk("\ttacc_irq_enter %u, place %d, cnt %d\n", smp_processor_id(), 
place, this_cpu(tacc).irq_cnt);
+    ASSERT(!local_irq_is_enabled());
+    ASSERT(tacc->irq_cnt >= 0);
You can make irq_cnt unsigned and drop this assert.

No. Otherwise one might miss proper call sequence when utilize this for the 
different arch, and have no notice from the debug assertion.


+
+    if ( tacc->irq_cnt == 0 )
+    {
+        tacc->irq_enter_time = NOW();
+    }
Coding style:

---
Braces should be omitted for blocks with a single statement. e.g.,

if ( condition )
     single_statement();
---


OK.

+
+    tacc->irq_cnt++;
+}
+
+void tacc_irq_exit(int place)
+{
+    struct tacc* tacc = &this_cpu(tacc);
+
+//    printk("\ttacc_irq_exit %u, place %d, cnt %d\n", smp_processor_id(), place, 
tacc->irq_cnt);
+    ASSERT(!local_irq_is_enabled());
+    ASSERT(tacc->irq_cnt > 0);
+    if ( tacc->irq_cnt == 1 )
+    {
+        tacc->irq_time = NOW() - tacc->irq_enter_time;
+        tacc->irq_enter_time = 0;
+    }
+
+    tacc->irq_cnt--;
What if, you IRQ will arrive right after this? I believe, you will lose
some of the accumulated time.

See ASSERT(!local_irq_is_enabled()) above.


+}
+
  void context_saved(struct vcpu *prev)
  {
      /* Clear running flag /after/ writing context to memory. */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e3601c1..04a8724 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1002,6 +1002,33 @@ extern void dump_runq(unsigned char key);
void arch_do_physinfo(struct xen_sysctl_physinfo *pi); +enum TACC_STATES {
If I remember correct, enum names should in lower case

Ugh...


+    TACC_HYP = 0,
+    TACC_GUEST = 1,
+    TACC_IDLE = 2,
+    TACC_IRQ = 3,
+    TACC_GSYNC = 4,
+    TACC_STATES_MAX
+};
+
+struct tacc
+{
+    s_time_t state_time[TACC_STATES_MAX];
+    s_time_t state_entry_time;
+    int state;
enum, maybe?

Maybe.


+
+    s_time_t guest_time;
+
+    s_time_t irq_enter_time;
+    s_time_t irq_time;
+    int irq_cnt;
+};
+
+DECLARE_PER_CPU(struct tacc, tacc);
+
+void tacc_hyp(int place);
+void tacc_idle(int place);
What about functions from sched.c? Should they be declared there?

Maybe.


+
  #endif /* __SCHED_H__ */
/*



[1] https://elixir.bootlin.com/linux/latest/source/kernel/sched/cputime.c#L429

--
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.