[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen cpufreq support status: how to notify hypervisor of frequency change?



On Monday 10 April 2006 10:01 am, Keir Fraser wrote:
>
> The TSC needs recalibrating on the affected CPU. This will require
> adapting local_time_calibration() in arch/x86/time.c.
>

Thanks a lot - I followed your suggestion and added this functionality in the 
two patches below (and attached).

The first patch (cpufreq-xen-dom0-func.diff) adds a dom0 "setcpufreq" 
hypercall that sets cpu_khz to the specified value and re-runs 
local_time_calibration() on any CPU(s) the setcpufreq call specifies.

In the second patch (cpufreq-xen-linux-2.6.16, to be applied to the 
linux-2.6.hg tree), the Linux cpufreq drivers running in dom0 use this call 
immediately after writing whichever MSR triggers the frequency/voltage shift, 
but before triggering a cpufreq callback to adjust the dom0 kernel's own 
timer and TSC parameters.

Note that I only implemented this for powernow-k8 right now, since that's the 
only hardware I could test it on. It's very obvious how to adapt it to the 
other cpufreq drivers, by just replacing rdmsr and wrmsr with the Xen wrapper 
versions I provided, and adding in the setcpufreq hypercall at the end.

The patches seem to work correctly - I've had virtually no latency issues or 
temporary freezes after frequency transitions now, although I occasionally 
still get a mouse freeze and "psmouse: lost synchronization, throwing X bytes 
away" messages in the syslog. This is on an Athlon 64 laptop stepping between 
800/1600/2000 MHz.

I didn't test this on an SMP system yet, but I think it's SMP-safe, but only 
if local_time_calibration() is reentrant (since it's scheduled once a second 
and there may be a collision in rare cases). Is this an issue?

I can test it on a dual-core Athlon 64 next week if needed (my SMP Xen test 
machine is unavailable until then). Someone with a Core Duo should patch 
speedstep-centrino.c and test it there too.

Try this out and let me know if there are problems. It doesn't seem to have 
the issues I was having before, but that's a matter of user experience, so I 
might be mistaken or just imagining it.

- Matt Yourst

diff -r 886594fa3aef xen/arch/x86/dom0_ops.c
--- a/xen/arch/x86/dom0_ops.c   Sat Apr  8 12:10:04 2006 +0100
+++ b/xen/arch/x86/dom0_ops.c   Mon Apr 10 19:44:19 2006 -0400
@@ -339,6 +339,13 @@ long arch_do_dom0_op(struct dom0_op *op,
     }
     break;
 
+    case DOM0_SETCPUFREQ:
+    {
+        extern int set_cpu_freq(cpumap_t cpumap, unsigned long khz);
+        ret = set_cpu_freq(op->u.setcpufreq.cpumap, op->u.setcpufreq.khz);
+        break;
+    }
+
     case DOM0_GETMEMLIST:
     {
         int i;
diff -r 886594fa3aef xen/arch/x86/time.c
--- a/xen/arch/x86/time.c       Sat Apr  8 12:10:04 2006 +0100
+++ b/xen/arch/x86/time.c       Mon Apr 10 19:44:19 2006 -0400
@@ -914,6 +914,60 @@ void __init early_time_init(void)
     setup_irq(0, &irq0);
 }
 
+/*
+ * This is called when the hypervisor is notified of a CPU core
+ * frequency change initiated by a driver in dom0.
+ *
+ * This should be called after the new frequency has stabilized.
+ *
+ * The CPUs specified may include any CPU or core (if cores have
+ * independent PLLs). In an SMP or multi-core system, it may
+ * take a while for the recalibration function to be scheduled
+ * on the intended target CPUs; there is no guarantee this will
+ * happen by the time this call returns.
+ *
+ */
+typedef struct percpu_freq_update {
+    cpumap_t cpumap;
+    unsigned long khz;
+} percpu_freq_update_t;
+
+void set_cpu_freq_percpu(void *p) {
+    percpu_freq_update_t *data = (percpu_freq_update_t*)p;
+    int affected;
+
+    if (!data) {
+        printk("  Adjust freq on cpu %d: no data!\n", smp_processor_id());
+        return;
+    }
+
+    affected = ((data->cpumap & (1 << smp_processor_id())) != 0);
+
+    printk("  Frequency change request on cpu %d: cpumap %08llx, khz %ld 
(%s)\n",
+           smp_processor_id(), (unsigned long long)data->cpumap, data->khz,
+           (affected ? "adjusting" : "skipping"));
+
+    if (affected) {
+        cpu_khz = data->khz;
+        local_time_calibration(NULL);
+
+        printk("    Recalibrated timers on cpu %d to %ld khz\n",
+               smp_processor_id(), data->khz);
+    }
+}
+
+int set_cpu_freq(cpumap_t cpumap, unsigned long khz) {
+    percpu_freq_update_t freq_update;
+
+    printk("CPU frequency change request: cpumap %08llx, khz %ld",
+           (unsigned long long)cpumap, khz);
+
+    freq_update.cpumap = cpumap;
+    freq_update.khz = khz;
+
+    return on_each_cpu(set_cpu_freq_percpu, &freq_update, 1, 1);
+}
+
 void send_timer_event(struct vcpu *v)
 {
     send_guest_vcpu_virq(v, VIRQ_TIMER);
diff -r 886594fa3aef xen/include/public/dom0_ops.h
--- a/xen/include/public/dom0_ops.h     Sat Apr  8 12:10:04 2006 +0100
+++ b/xen/include/public/dom0_ops.h     Mon Apr 10 19:44:19 2006 -0400
@@ -283,6 +283,28 @@ typedef struct dom0_getpageframeinfo2 {
     GUEST_HANDLE(ulong) array;
 } dom0_getpageframeinfo2_t;
 DEFINE_GUEST_HANDLE(dom0_getpageframeinfo2_t);
+
+/*
+ * Notify hypervisor of a CPU core frequency change completed
+ * by cpufreq driver in dom0, triggering an internal timer
+ * recalibration.
+ *
+ * This should be called after the new frequency has stabilized.
+ *
+ * The CPUs specified may include any CPU or core (if cores have
+ * independent PLLs). In an SMP or multi-core system, it may
+ * take a while for the recalibration function to be scheduled
+ * on the intended target CPUs; there is no guarantee this will
+ * happen by the time this call returns.
+ *
+ */
+#define DOM0_SETCPUFREQ       30
+typedef struct dom0_setcpufreq {
+    /* IN variables */
+    cpumap_t cpumap;
+    unsigned long khz;
+} dom0_setcpufreq_t;
+DEFINE_GUEST_HANDLE(dom0_setcpufreq_t);
 
 /*
  * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type.
@@ -496,6 +518,7 @@ typedef struct dom0_op {
         struct dom0_shadow_control    shadow_control;
         struct dom0_setdomainmaxmem   setdomainmaxmem;
         struct dom0_getpageframeinfo2 getpageframeinfo2;
+        struct dom0_setcpufreq        setcpufreq;
         struct dom0_add_memtype       add_memtype;
         struct dom0_del_memtype       del_memtype;
         struct dom0_read_memtype      read_memtype;

---------------------------------------------------------------

diff -r 640f8b15b9dd arch/i386/kernel/cpu/cpufreq/powernow-k8.c
--- a/arch/i386/kernel/cpu/cpufreq/powernow-k8.c        Fri Apr  7 01:32:54 
2006 
+0100
+++ b/arch/i386/kernel/cpu/cpufreq/powernow-k8.c        Mon Apr 10 19:33:49 
2006 
-0400
@@ -48,6 +48,37 @@
 #define VERSION "version 1.60.0"
 #include "powernow-k8.h"
 
+/* Xen support */
+
+#ifdef CONFIG_XEN_PRIVILEGED_GUEST
+int xen_access_msr(u32 msr, int write, u32* out1, u32* out2, u32 in1, u32 
in2) {
+       dom0_op_t op;
+       op.cmd = DOM0_MSR;
+  op.u.msr.write = write;
+  op.u.msr.cpu_mask = 1; /* only first CPU: not clear how to read multiple 
CPUs */
+  op.u.msr.msr = msr;
+  op.u.msr.in1 = in1;
+  op.u.msr.in2 = in2;
+       BUG_ON(HYPERVISOR_dom0_op(&op));
+
+  if (!write) {
+    *out1 = op.u.msr.out1; /* low 32 bits */
+    *out2 = op.u.msr.out2; /* high 32 bits */
+  }
+
+  return 0;
+}
+
+#define cpu_rdmsr(msr, val1, val2) xen_access_msr((msr), 0, &(val1), &(val2), 
0, 0)
+#define cpu_wrmsr(msr, val1, val2) xen_access_msr((msr), 1, NULL, NULL, 
(val1), (val2))
+
+#else
+
+#define cpu_rdmsr(msr, val1, val2) rdmsr(msr, val1, val2)
+#define cpu_wrmsr(msr, val1, val2) wrmsr(msr, val1, val2)
+
+#endif
+
 /* serialize freq changes  */
 static DECLARE_MUTEX(fidvid_sem);
 
@@ -98,7 +129,7 @@ static int pending_bit_stuck(void)
 {
        u32 lo, hi;
 
-       rdmsr(MSR_FIDVID_STATUS, lo, hi);
+       cpu_rdmsr(MSR_FIDVID_STATUS, lo, hi);
        return lo & MSR_S_LO_CHANGE_PENDING ? 1 : 0;
 }
 
@@ -116,7 +147,7 @@ static int query_current_values_with_pen
                        dprintk("detected change pending stuck\n");
                        return 1;
                }
-               rdmsr(MSR_FIDVID_STATUS, lo, hi);
+               cpu_rdmsr(MSR_FIDVID_STATUS, lo, hi);
        } while (lo & MSR_S_LO_CHANGE_PENDING);
 
        data->currvid = hi & MSR_S_HI_CURRENT_VID;
@@ -145,13 +176,13 @@ static void fidvid_msr_init(void)
        u32 lo, hi;
        u8 fid, vid;
 
-       rdmsr(MSR_FIDVID_STATUS, lo, hi);
+       cpu_rdmsr(MSR_FIDVID_STATUS, lo, hi);
        vid = hi & MSR_S_HI_CURRENT_VID;
        fid = lo & MSR_S_LO_CURRENT_FID;
        lo = fid | (vid << MSR_C_LO_VID_SHIFT);
        hi = MSR_C_HI_STP_GNT_BENIGN;
        dprintk("cpu%d, init lo 0x%x, hi 0x%x\n", smp_processor_id(), lo, hi);
-       wrmsr(MSR_FIDVID_CTL, lo, hi);
+       cpu_wrmsr(MSR_FIDVID_CTL, lo, hi);
 }
 
 
@@ -173,7 +204,7 @@ static int write_new_fid(struct powernow
                fid, lo, data->plllock * PLL_LOCK_CONVERSION);
 
        do {
-               wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
+               cpu_wrmsr(MSR_FIDVID_CTL, lo, data->plllock * 
PLL_LOCK_CONVERSION);
                if (i++ > 100) {
                        printk(KERN_ERR PFX "internal error - pending bit very 
stuck - no further 
pstate changes possible\n");
                        return 1;
@@ -215,7 +246,7 @@ static int write_new_vid(struct powernow
                vid, lo, STOP_GRANT_5NS);
 
        do {
-               wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
+               cpu_wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
                if (i++ > 100) {
                        printk(KERN_ERR PFX "internal error - pending bit very 
stuck - no further 
pstate changes possible\n");
                        return 1;
@@ -294,7 +325,7 @@ static int core_voltage_pre_transition(s
                smp_processor_id(),
                data->currfid, data->currvid, reqvid, data->rvo);
 
-       rdmsr(MSR_FIDVID_STATUS, lo, maxvid);
+       cpu_rdmsr(MSR_FIDVID_STATUS, lo, maxvid);
        maxvid = 0x1f & (maxvid >> 16);
        dprintk("ph1 maxvid=0x%x\n", maxvid);
        if (reqvid < maxvid) /* lower numbers are higher voltages */
@@ -892,6 +923,19 @@ static int transition_frequency(struct p
 
        res = transition_fid_vid(data, fid, vid);
 
+#ifdef CONFIG_XEN_PRIVILEGED_GUEST
+  {
+    dom0_op_t op;
+    int rc;
+    // printk("powernow-k8: notifying Xen of transition to %d khz on cpu 
%d\n", freqs.new, freqs.cpu);
+    op.cmd = DOM0_SETCPUFREQ;
+    op.u.setcpufreq.cpumap = (1 << freqs.cpu);
+    op.u.setcpufreq.khz = freqs.new;
+    rc = HYPERVISOR_dom0_op(&op);
+    // printk("powernow-k8: notified Xen of transition to %d khz on cpu %d 
(rc %d)\n", freqs.new, freqs.cpu, rc);
+  }
+#endif
+
        freqs.new = find_khz_freq_from_fid(data->currfid);
        for_each_cpu_mask(i, cpu_core_map[data->cpu]) {
                freqs.cpu = i;
diff -r 640f8b15b9dd include/xen/interface/dom0_ops.h
--- a/include/xen/interface/dom0_ops.h  Fri Apr  7 01:32:54 2006 +0100
+++ b/include/xen/interface/dom0_ops.h  Mon Apr 10 19:33:49 2006 -0400
@@ -283,6 +283,28 @@ typedef struct dom0_getpageframeinfo2 {
     GUEST_HANDLE(ulong) array;
 } dom0_getpageframeinfo2_t;
 DEFINE_GUEST_HANDLE(dom0_getpageframeinfo2_t);
+
+/*
+ * Notify hypervisor of a CPU core frequency change completed
+ * by cpufreq driver in dom0, triggering an internal timer
+ * recalibration.
+ *
+ * This should be called after the new frequency has stabilized.
+ *
+ * The CPUs specified may include any CPU or core (if cores have
+ * independent PLLs). In an SMP or multi-core system, it may
+ * take a while for the recalibration function to be scheduled
+ * on the intended target CPUs; there is no guarantee this will
+ * happen by the time this call returns.
+ *
+ */
+#define DOM0_SETCPUFREQ       30
+typedef struct dom0_setcpufreq {
+    /* IN variables */
+    cpumap_t cpumap;
+    unsigned long khz;
+} dom0_setcpufreq_t;
+DEFINE_GUEST_HANDLE(dom0_setcpufreq_t);
 
 /*
  * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type.
@@ -496,6 +518,7 @@ typedef struct dom0_op {
         struct dom0_shadow_control    shadow_control;
         struct dom0_setdomainmaxmem   setdomainmaxmem;
         struct dom0_getpageframeinfo2 getpageframeinfo2;
+        struct dom0_setcpufreq        setcpufreq;
         struct dom0_add_memtype       add_memtype;
         struct dom0_del_memtype       del_memtype;
         struct dom0_read_memtype      read_memtype;

-------------------------------------------------------
 Matt T. Yourst               yourst@xxxxxxxxxxxxxxxxx
 Binghamton University, Department of Computer Science
-------------------------------------------------------

Attachment: cpufreq-xen-dom0-func.diff
Description: Text Data

Attachment: cpufreq-xen-linux-2.6.16.diff
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.