[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT



When using TSC as clocksource we will solely rely on TSC for updating
vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp
at different instants meaning every EPOCH + delta.  This delta is
variable depending on the time the CPU calibrates with CPU 0 (master),
and will likely be different and variable across vCPUS. This means
that each VCPU pvti won't account to its calibration error which could
lead to time going backwards, and allowing a situation where time read
on VCPU B immediately after A being smaller. While this doesn't happen
a lot, I was able to observe (for clocksource=tsc) around 50 times in
an hour having warps of < 100 ns.

This patch proposes relying on host TSC synchronization and
passthrough to the guest, when running on a TSC-safe platform. On
time_calibration we retrieve the platform time in ns and the counter
read by the clocksource that was used to compute system time. We
introduce a new rendezous function which doesn't require
synchronization between master and slave CPUS and just reads
calibration_rendezvous struct and writes it down the stime and stamp
to the cpu_calibration struct to be used later on. We can guarantee that
on a platform with a constant and reliable TSC, that the time read on
vcpu B right after A is bigger independently of the CPU calibration
error. Since pvclock time infos are monotonic as seen by any vCPU set
PVCLOCK_TSC_STABLE_BIT, which then enables usage of VDSO on Linux.
IIUC, this is similar to how it's implemented on KVM.

Note that PVCLOCK_TSC_STABLE_BIT is set only when CPU hotplug isn't
meant to be performed on the host, which will either be when max vcpus
and num_present_cpu are the same or if "nocpuhotplug" command line
parameter is used. This is because a newly hotplugged CPU may not
satisfy the condition of having all TSCs synchronized.

Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
---
Cc: Keir Fraser <keir@xxxxxxx>
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Perhaps "cpuhotplugsafe" would be a better name, since potentially
hardware could guarantee TSCs are synchronized on hotplug?

Changes since v1:
 - Change approach to follow Andrew's guideline to
 skip std_rendezvous. And doing so by introducing a nop_rendezvous
 - Change commit message reflecting the change above.
 - Use TSC_STABLE_BIT only if cpu hotplug isn't possible.
 - Add command line option to override it if no cpu hotplug is
 intended.
---
 xen/arch/x86/time.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 123aa42..1dcd4af 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -43,6 +43,10 @@
 static char __initdata opt_clocksource[10];
 string_param("clocksource", opt_clocksource);
 
+/* opt_nocpuhotplug: Set if CPU hotplug isn't meant to be used */
+static bool_t __initdata opt_nocpuhotplug;
+boolean_param("nocpuhotplug", opt_nocpuhotplug);
+
 unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
 DEFINE_SPINLOCK(rtc_lock);
 unsigned long pit0_ticks;
@@ -435,6 +439,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
  * PLATFORM TIMER 4: TSC
  */
 static bool_t clocksource_is_tsc;
+static bool_t use_tsc_stable_bit;
 static u64 tsc_freq;
 static unsigned long tsc_max_warp;
 static void tsc_check_reliability(void);
@@ -468,6 +473,11 @@ static int __init init_tsctimer(struct platform_timesource 
*pts)
 
     pts->frequency = tsc_freq;
     clocksource_is_tsc = tsc_reliable;
+    use_tsc_stable_bit = clocksource_is_tsc &&
+        ((nr_cpu_ids == num_present_cpus()) || opt_nocpuhotplug);
+
+    if ( clocksource_is_tsc && !use_tsc_stable_bit )
+        printk(XENLOG_INFO "TSC: CPU Hotplug intended, not setting stable 
bit\n");
 
     return tsc_reliable;
 }
@@ -950,6 +960,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int 
force)
 
     _u.tsc_timestamp = tsc_stamp;
     _u.system_time   = t->stime_local_stamp;
+    if ( use_tsc_stable_bit )
+        _u.flags    |= PVCLOCK_TSC_STABLE_BIT;
 
     if ( is_hvm_domain(d) )
         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
@@ -1431,6 +1443,22 @@ static void time_calibration_std_rendezvous(void *_r)
     raise_softirq(TIME_CALIBRATE_SOFTIRQ);
 }
 
+/*
+ * Rendezvous function used when clocksource is TSC and
+ * no CPU hotplug will be performed.
+ */
+static void time_calibration_nop_rendezvous(void *_r)
+{
+    struct cpu_calibration *c = &this_cpu(cpu_calibration);
+    struct calibration_rendezvous *r = _r;
+
+    c->local_tsc_stamp = r->master_tsc_stamp;
+    c->stime_local_stamp = get_s_time();
+    c->stime_master_stamp = r->master_stime;
+
+    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
+}
+
 static void (*time_calibration_rendezvous_fn)(void *) =
     time_calibration_std_rendezvous;
 
@@ -1440,6 +1468,13 @@ static void time_calibration(void *unused)
         .semaphore = ATOMIC_INIT(0)
     };
 
+    if ( use_tsc_stable_bit )
+    {
+        local_irq_disable();
+        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
+        local_irq_enable();
+    }
+
     cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
 
     /* @wait=1 because we must wait for all cpus before freeing @r. */
@@ -1555,6 +1590,14 @@ static int __init verify_tsc_reliability(void)
 
             init_percpu_time();
 
+            /*
+             * We won't do CPU Hotplug and TSC clocksource is being used which
+            * means we have a reliable TSC, plus we don't sync with any other
+            * clocksource so no need for rendezvous.
+             */
+            if ( use_tsc_stable_bit )
+                time_calibration_rendezvous_fn = 
time_calibration_nop_rendezvous;
+
             init_timer(&calibration_timer, time_calibration, NULL, 0);
             set_timer(&calibration_timer, NOW() + EPOCH);
         }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.