[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v6] new config option vtsc_tolerance_khz to avoid TSC emulation



Add an option to control when vTSC emulation will be activated for a
domU with tsc_mode=default. Without such option each TSC access from
domU will be emulated, which causes a significant perfomance drop for
workloads that make use of rdtsc.

One option to avoid the TSC option is to run domUs with tsc_mode=native.
This has the drawback that migrating a domU from a "2.3GHz" class host
to a "2.4GHz" class host may change the rate at wich the TSC counter
increases, the domU may not be prepared for that.

With the new option the host admin can decide how a domU should behave
when it is migrated across systems of the same class. Since there is
always some jitter when Xen calibrates the cpu_khz value, all hosts of
the same class will most likely have slightly different values. As a
result vTSC emulation is unavoidable. Data collected during the incident
which triggered this change showed a jitter of up to 200 KHz across
systems of the same class.

Existing padding fields are reused to store vtsc_khz_tolerance as u16.

v6:
 - mention default value in xl.cfg
 - tsc_set_info: remove usage of __func__, use %d for domid
 - tsc_set_info: use ABS to calculate khz_diff
v5:
 - reduce functionality to allow setting of the tolerance value
   only at initial domU startup
v4:
 - add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz
v3:
 - rename vtsc_khz_tolerance to vtsc_tolerance_khz
 - separate domctls to adjust values
 - more docs
 - update libxl.h
 - update python tests
 - flask check bound to tsc permissions
 - not runtime tested due to dlsym() build errors in staging

Signed-off-by: Olaf Hering <olaf@xxxxxxxxx>
---
 docs/man/xen-tscmode.pod.7               | 16 ++++++++++++++++
 docs/man/xl.cfg.pod.5.in                 | 10 ++++++++++
 docs/specs/libxc-migration-stream.pandoc |  6 ++++--
 tools/libxc/include/xenctrl.h            |  2 ++
 tools/libxc/xc_domain.c                  |  4 ++++
 tools/libxc/xc_sr_common_x86.c           |  6 ++++--
 tools/libxc/xc_sr_stream_format.h        |  3 ++-
 tools/libxl/libxl.h                      |  6 ++++++
 tools/libxl/libxl_types.idl              |  1 +
 tools/libxl/libxl_x86.c                  |  3 ++-
 tools/python/xen/lowlevel/xc/xc.c        |  2 +-
 tools/xl/xl_parse.c                      |  3 +++
 xen/arch/x86/domain.c                    |  2 +-
 xen/arch/x86/domctl.c                    |  2 ++
 xen/arch/x86/time.c                      | 30 +++++++++++++++++++++++++++---
 xen/include/asm-x86/domain.h             |  1 +
 xen/include/asm-x86/time.h               |  6 ++++--
 xen/include/public/domctl.h              |  3 ++-
 18 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 3bbc96f201..122ae36679 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -99,6 +99,9 @@ whether or not the VM has been saved/restored/migrated
 
 =back
 
+If the tsc_mode is set to "default" the decision to emulate TSC can be
+tweaked further with the "vtsc_tolerance_khz" option.
+
 To understand this in more detail, the rest of this document must
 be read.
 
@@ -211,6 +214,19 @@ is emulated.  Note that, though emulated, the "apparent" 
TSC frequency
 will be the TSC frequency of the initial physical machine, even after
 migration.
 
+Since the calibration of the TSC frequency may not be 100% accurate, the
+exact value of the frequency can change even across reboots. This means
+also several otherwise identical systems can have a slightly different
+TSC frequency. As a result TSC access will be emulated if a domU is
+migrated from one host to another, identical host. To avoid the
+performance impact of TSC emulation a certain tolerance of the measured
+host TSC frequency can be specified with "vtsc_tolerance_khz". If the
+measured "cpu_khz" value is within the tolerance range, TSC access
+remains native. Otherwise it will be emulated. This allows to migrate
+domUs between identical hardware. If the domU will be migrated to a
+different kind of hardware, say from a "2.3GHz" to a "2.5GHz" system,
+TSC will be emualted to maintain the TSC frequency expected by the domU.
+
 For environments where both TSC-safeness AND highest performance
 even across migration is a requirement, application code can be specially
 modified to use an algorithm explicitly designed into Xen for this purpose.
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 2c1a6e1422..aff16052ef 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1891,6 +1891,16 @@ determined in a similar way to that of B<default> TSC 
mode.
 
 Please see B<xen-tscmode(7)> for more information on this option.
 
+=item B<vtsc_tolerance_khz="KHZ">
+
+B<(x86 only, relevant only for tsc_mode=default)>
+When a domU is started, the CPU frequency of the host is used by the domU for
+TSC related time measurement. Once the domU is either migrated or
+saved/restored on another host that CPU frequency has to be emulated to avoid
+timedrift. To avoid the performance penalty of the TSC emulation, allow a
+certain amount of jitter of the measured CPU frequency on the hosts the domU
+is supposed to run on. Default value is 0, i.e. no tolerance.
+
 =item B<localtime=BOOLEAN>
 
 Set the real time clock to local time or to UTC. False (0) by default,
diff --git a/docs/specs/libxc-migration-stream.pandoc 
b/docs/specs/libxc-migration-stream.pandoc
index 73421ff393..0d0f17edb1 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -3,7 +3,7 @@
   Andrew Cooper <<andrew.cooper3@xxxxxxxxxx>>
   Wen Congyang <<wency@xxxxxxxxxxxxxx>>
   Yang Hongyang <<hongyang.yang@xxxxxxxxxxxx>>
-% Revision 2
+% Revision 3
 
 Introduction
 ============
@@ -472,7 +472,7 @@ XEN\_DOMCTL\_{get,set}tscinfo hypercall sub-ops.
     +------------------------+------------------------+
     | nsec                                            |
     +------------------------+------------------------+
-    | incarnation            | (reserved)             |
+    | incarnation            | tolerance | (reserved) |
     +------------------------+------------------------+
 
 --------------------------------------------------------------------
@@ -485,6 +485,8 @@ khz              TSC frequency, in kHz.
 nsec             Elapsed time, in nanoseconds.
 
 incarnation      Incarnation.
+
+tolerance        Amount of Jitter the domU can handle after migration
 --------------------------------------------------------------------
 
 \clearpage
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 058e832c47..96bdd5609d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1360,6 +1360,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
                            uint32_t tsc_mode,
                            uint64_t elapsed_nsec,
                            uint32_t gtsc_khz,
+                           uint16_t vtsc_tolerance_khz,
                            uint32_t incarnation);
 
 int xc_domain_get_tsc_info(xc_interface *xch,
@@ -1367,6 +1368,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
                            uint32_t *tsc_mode,
                            uint64_t *elapsed_nsec,
                            uint32_t *gtsc_khz,
+                           uint16_t *vtsc_tolerance_khz,
                            uint32_t *incarnation);
 
 int xc_domain_disable_migrate(xc_interface *xch, uint32_t domid);
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 26b4b908b9..36acc1c45f 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -852,6 +852,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
                            uint32_t tsc_mode,
                            uint64_t elapsed_nsec,
                            uint32_t gtsc_khz,
+                           uint16_t vtsc_tolerance_khz,
                            uint32_t incarnation)
 {
     DECLARE_DOMCTL;
@@ -860,6 +861,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
     domctl.u.tsc_info.tsc_mode = tsc_mode;
     domctl.u.tsc_info.elapsed_nsec = elapsed_nsec;
     domctl.u.tsc_info.gtsc_khz = gtsc_khz;
+    domctl.u.tsc_info.vtsc_tolerance_khz = vtsc_tolerance_khz;
     domctl.u.tsc_info.incarnation = incarnation;
     return do_domctl(xch, &domctl);
 }
@@ -869,6 +871,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
                            uint32_t *tsc_mode,
                            uint64_t *elapsed_nsec,
                            uint32_t *gtsc_khz,
+                           uint16_t *vtsc_tolerance_khz,
                            uint32_t *incarnation)
 {
     int rc;
@@ -882,6 +885,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
         *tsc_mode = domctl.u.tsc_info.tsc_mode;
         *elapsed_nsec = domctl.u.tsc_info.elapsed_nsec;
         *gtsc_khz = domctl.u.tsc_info.gtsc_khz;
+        *vtsc_tolerance_khz = domctl.u.tsc_info.vtsc_tolerance_khz;
         *incarnation = domctl.u.tsc_info.incarnation;
     }
     return rc;
diff --git a/tools/libxc/xc_sr_common_x86.c b/tools/libxc/xc_sr_common_x86.c
index 98f1cef30f..ea3e551a83 100644
--- a/tools/libxc/xc_sr_common_x86.c
+++ b/tools/libxc/xc_sr_common_x86.c
@@ -12,7 +12,8 @@ int write_tsc_info(struct xc_sr_context *ctx)
     };
 
     if ( xc_domain_get_tsc_info(xch, ctx->domid, &tsc.mode,
-                                &tsc.nsec, &tsc.khz, &tsc.incarnation) < 0 )
+                                &tsc.nsec, &tsc.khz, &tsc.vtsc_tolerance,
+                                &tsc.incarnation) < 0 )
     {
         PERROR("Unable to obtain TSC information");
         return -1;
@@ -34,7 +35,8 @@ int handle_tsc_info(struct xc_sr_context *ctx, struct 
xc_sr_record *rec)
     }
 
     if ( xc_domain_set_tsc_info(xch, ctx->domid, tsc->mode,
-                                tsc->nsec, tsc->khz, tsc->incarnation) )
+                                tsc->nsec, tsc->khz, tsc->vtsc_tolerance,
+                                tsc->incarnation) )
     {
         PERROR("Unable to set TSC information");
         return -1;
diff --git a/tools/libxc/xc_sr_stream_format.h 
b/tools/libxc/xc_sr_stream_format.h
index 15ff1c7efb..9b52f6ace6 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -121,7 +121,8 @@ struct xc_sr_rec_tsc_info
     uint32_t khz;
     uint64_t nsec;
     uint32_t incarnation;
-    uint32_t _res1;
+    uint16_t vtsc_tolerance;
+    uint16_t _res1;
 };
 
 /* HVM_PARAMS */
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index edd244278a..7e2b703251 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -354,6 +354,12 @@
 #define LIBXL_HAVE_BUILDINFO_BOOTLOADER 1
 #define LIBXL_HAVE_BUILDINFO_BOOTLOADER_ARGS 1
 
+/*
+ * LIBXL_HAVE_VTSC_TOLERANCE_KHZ indicates that libxl_domain_build_info
+ * has the vtsc_tolerance_khz field.
+ */
+#define LIBXL_HAVE_VTSC_TOLERANCE_KHZ 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index dbb287d6fe..8b898bb3c9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -466,6 +466,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
     ("numa_placement",  libxl_defbool),
     ("tsc_mode",        libxl_tsc_mode),
+    ("vtsc_tolerance_khz", uint32),
     ("max_memkb",       MemKB),
     ("target_memkb",    MemKB),
     ("video_memkb",     MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 1e9f98961b..ab5ff9aa8b 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -313,7 +313,8 @@ int libxl__arch_domain_create(libxl__gc *gc, 
libxl_domain_config *d_config,
     default:
         abort();
     }
-    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
+    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0,
+                           d_config->b_info.vtsc_tolerance_khz, 0);
     if (libxl_defbool_val(d_config->b_info.disable_migrate))
         xc_domain_disable_migrate(ctx->xch, domid);
     rtc_timeoffset = d_config->b_info.rtc_timeoffset;
diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index f501764100..e73e2cafc7 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1522,7 +1522,7 @@ static PyObject *pyxc_domain_set_tsc_info(XcObject *self, 
PyObject *args)
     if (!PyArg_ParseTuple(args, "ii", &dom, &tsc_mode))
         return NULL;
 
-    if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0) != 0)
+    if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0, 0) != 
0)
         return pyxc_error_to_exception(self->xc_handle);
 
     Py_INCREF(zero);
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 8b999825d2..ddaddd6e65 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1126,6 +1126,9 @@ void parse_config_data(const char *config_source,
         }
     }
 
+    if (!xlu_cfg_get_long(config, "vtsc_tolerance_khz", &l, 0))
+        b_info->vtsc_tolerance_khz = l < 0 || l > UINT16_MAX ? UINT16_MAX : l;
+
     if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
         b_info->rtc_timeoffset = l;
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fbb320da9c..d40b91721e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -561,7 +561,7 @@ int arch_domain_create(struct domain *d,
         ASSERT_UNREACHABLE(); /* Not HVM and not PV? */
 
     /* initialize default tsc behavior in case tools don't */
-    tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
+    tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0, 0);
 
     /* PV/PVH guests get an emulated PIT too for video BIOSes to use. */
     pit_init(d, cpu_khz);
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..d86ff58482 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -939,6 +939,7 @@ long arch_do_domctl(
             tsc_get_info(d, &domctl->u.tsc_info.tsc_mode,
                          &domctl->u.tsc_info.elapsed_nsec,
                          &domctl->u.tsc_info.gtsc_khz,
+                         &domctl->u.tsc_info.vtsc_tolerance_khz,
                          &domctl->u.tsc_info.incarnation);
             domain_unpause(d);
             copyback = true;
@@ -954,6 +955,7 @@ long arch_do_domctl(
             tsc_set_info(d, domctl->u.tsc_info.tsc_mode,
                          domctl->u.tsc_info.elapsed_nsec,
                          domctl->u.tsc_info.gtsc_khz,
+                         domctl->u.tsc_info.vtsc_tolerance_khz,
                          domctl->u.tsc_info.incarnation);
             domain_unpause(d);
         }
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 84c1c0c082..c96d643acb 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2064,7 +2064,7 @@ int host_tsc_is_safe(void)
  */
 void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
                   uint64_t *elapsed_nsec, uint32_t *gtsc_khz,
-                  uint32_t *incarnation)
+                  uint16_t *vtsc_tolerance_khz, uint32_t *incarnation)
 {
     bool enable_tsc_scaling = is_hvm_domain(d) &&
                               hvm_tsc_scaling_supported && !d->arch.vtsc;
@@ -2080,6 +2080,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
         *elapsed_nsec = *gtsc_khz = 0;
         break;
     case TSC_MODE_DEFAULT:
+        *vtsc_tolerance_khz = d->arch.vtsc_tolerance_khz;
         if ( d->arch.vtsc )
         {
     case TSC_MODE_ALWAYS_EMULATE:
@@ -2122,7 +2123,8 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
  */
 void tsc_set_info(struct domain *d,
                   uint32_t tsc_mode, uint64_t elapsed_nsec,
-                  uint32_t gtsc_khz, uint32_t incarnation)
+                  uint32_t gtsc_khz, uint16_t vtsc_tolerance_khz,
+                  uint32_t incarnation)
 {
     ASSERT(!is_system_domain(d));
 
@@ -2134,9 +2136,12 @@ void tsc_set_info(struct domain *d,
 
     switch ( d->arch.tsc_mode = tsc_mode )
     {
+        bool disable_vtsc;
         bool enable_tsc_scaling;
 
     case TSC_MODE_DEFAULT:
+        d->arch.vtsc_tolerance_khz = vtsc_tolerance_khz;
+        /* Fallthrough. */
     case TSC_MODE_ALWAYS_EMULATE:
         d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
         d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
@@ -2149,8 +2154,25 @@ void tsc_set_info(struct domain *d,
          * When a guest is created, gtsc_khz is passed in as zero, making
          * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
          */
+        disable_vtsc = d->arch.tsc_khz == cpu_khz;
+
+        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz &&
+             d->arch.vtsc_tolerance_khz )
+        {
+            long khz_diff;
+
+            khz_diff = ABS((long)(cpu_khz - gtsc_khz));
+            disable_vtsc = khz_diff <= d->arch.vtsc_tolerance_khz;
+
+            printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
+                   " domU expects %u kHz,"
+                   " difference of %ld is %s tolerance of %u\n",
+                   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
+                   disable_vtsc ? "within" : "outside",
+                   d->arch.vtsc_tolerance_khz);
+        }
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
-             (d->arch.tsc_khz == cpu_khz ||
+             (disable_vtsc ||
               (is_hvm_domain(d) &&
                hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
         {
@@ -2239,6 +2261,8 @@ static void dump_softtsc(unsigned char key)
             printk(",ofs=%#"PRIx64, d->arch.vtsc_offset);
         if ( d->arch.tsc_khz )
             printk(",khz=%"PRIu32, d->arch.tsc_khz);
+        if ( d->arch.vtsc_tolerance_khz )
+            printk(",tol=%"PRIu16, d->arch.vtsc_tolerance_khz);
         if ( d->arch.incarnation )
             printk(",inc=%"PRIu32, d->arch.incarnation);
 #if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index a12ae47f1b..7743995934 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -374,6 +374,7 @@ struct arch_domain
     uint64_t vtsc_offset;    /* adjustment for save/restore/migrate */
     uint32_t tsc_khz;        /* cached guest khz for certain emulated or
                                 hardware TSC scaling cases */
+    uint32_t vtsc_tolerance_khz; /* domU handles that much jitter in cpu_khz */
     struct time_scale vtsc_to_ns; /* scaling for certain emulated or
                                      hardware TSC scaling cases */
     struct time_scale ns_to_vtsc; /* scaling for certain emulated or
diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h
index b3ae832df4..ef9be7a701 100644
--- a/xen/include/asm-x86/time.h
+++ b/xen/include/asm-x86/time.h
@@ -61,10 +61,12 @@ u64 gtime_to_gtsc(struct domain *d, u64 time);
 u64 gtsc_to_gtime(struct domain *d, u64 tsc);
 
 void tsc_set_info(struct domain *d, uint32_t tsc_mode, uint64_t elapsed_nsec,
-                  uint32_t gtsc_khz, uint32_t incarnation);
+                  uint32_t gtsc_khz, uint16_t vtsc_tolerance_khz,
+                  uint32_t incarnation);
    
 void tsc_get_info(struct domain *d, uint32_t *tsc_mode, uint64_t *elapsed_nsec,
-                  uint32_t *gtsc_khz, uint32_t *incarnation);
+                  uint32_t *gtsc_khz, uint16_t *vtsc_tolerance_khz,
+                  uint32_t *incarnation);
    
 
 void force_update_vcpu_system_time(struct vcpu *v);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index ec7a860afc..70a58ae2e4 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -702,7 +702,8 @@ struct xen_domctl_tsc_info {
     uint32_t tsc_mode;
     uint32_t gtsc_khz;
     uint32_t incarnation;
-    uint32_t pad;
+    uint16_t vtsc_tolerance_khz;
+    uint16_t pad;
     uint64_aligned_t elapsed_nsec;
 };
 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.