[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v4] new config option vtsc_tolerance_khz to avoid TSC emulation



Add an option to control when vTSC emulation will be activated for a
domU with tsc_mode=default. Without such option each TSC access from
domU will be emulated, which causes a significant perfomance drop for
workloads that make use of rdtsc.

Add a new domctl XEN_DOMCTL_set_vtsc_tolerance_khz to adjust the
tolerance value of a running domU that is supposed to be migrated.

One option to avoid the TSC option is to run domUs with tsc_mode=native.
This has the drawback that migrating a domU from a "2.3GHz" class host
to a "2.4GHz" class host may change the rate at wich the TSC counter
increases, the domU may not be prepared for that.

With this option the host admin can decide how a domU should behave when
it is migrated across systems of the same class. Since there is always
some jitter when Xen calibrates the cpu_khz value, all hosts of the same
class will most likely have slightly different values. As a result vTSC
emulation is unavoidable. Data collected during the incident which
triggered this change showed a jitter of up to 200 KHz across systems of
the same class.

A new utility is added which allows to adjust the vtsc_tolerance_khz
value for running domUs. This is useful to avoid emulation for domUs
that are already running and which can not be restarted.

The ordering of records sent during migration is important. The value of
vtsc_tolerance_khz must be known by the receiving host before
configuring TSC, because this is the place where the decision of vTSC
emulation is made. Therefore the existing write_tsc_info function is
modified to enforce that ordering.

v4:
 - add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz
v3:
 - rename vtsc_khz_tolerance to vtsc_tolerance_khz
 - separate domctls to adjust values
 - more docs
 - update libxl.h
 - update python tests
 - flask check bound to tsc permissions
 - not runtime tested due to dlsym() build errors in staging

Signed-off-by: Olaf Hering <olaf@xxxxxxxxx>
---
 .gitignore                               |   1 +
 docs/man/xen-tscmode.pod.7               |  16 +++++
 docs/man/xl.cfg.pod.5.in                 |  10 +++
 docs/specs/libxc-migration-stream.pandoc |  27 +++++++-
 tools/libxc/include/xenctrl.h            |   8 +++
 tools/libxc/xc_domain.c                  |  25 +++++++
 tools/libxc/xc_sr_common_x86.c           |  56 +++++++++++++++-
 tools/libxc/xc_sr_common_x86.h           |  11 +++-
 tools/libxc/xc_sr_restore_x86_hvm.c      |   3 +
 tools/libxc/xc_sr_restore_x86_pv.c       |   3 +
 tools/libxc/xc_sr_save_x86_hvm.c         |   2 +-
 tools/libxc/xc_sr_save_x86_pv.c          |   2 +-
 tools/libxc/xc_sr_stream_format.h        |   9 +++
 tools/libxl/libxl.h                      |   6 ++
 tools/libxl/libxl_types.idl              |   1 +
 tools/libxl/libxl_x86.c                  |   2 +
 tools/misc/Makefile                      |   4 ++
 tools/misc/xen-vtsc.c                    | 108 +++++++++++++++++++++++++++++++
 tools/python/xen/migration/libxc.py      |  24 +++++++
 tools/python/xen/migration/tests.py      |   1 +
 tools/xl/xl_parse.c                      |   3 +
 xen/arch/x86/domctl.c                    |   9 +++
 xen/arch/x86/time.c                      |  23 ++++++-
 xen/include/asm-x86/domain.h             |   1 +
 xen/include/public/domctl.h              |  10 +++
 xen/xsm/flask/hooks.c                    |   2 +
 26 files changed, 358 insertions(+), 9 deletions(-)
 create mode 100644 tools/misc/xen-vtsc.c

diff --git a/.gitignore b/.gitignore
index b0191bf91f..3c3cbeddfd 100644
--- a/.gitignore
+++ b/.gitignore
@@ -221,6 +221,7 @@ tools/misc/xen-detect
 tools/misc/xen-diag
 tools/misc/xen-tmem-list-parse
 tools/misc/xen-livepatch
+tools/misc/xen-vtsc
 tools/misc/xenperf
 tools/misc/xenpm
 tools/misc/xen-hvmctx
diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 3bbc96f201..122ae36679 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -99,6 +99,9 @@ whether or not the VM has been saved/restored/migrated
 
 =back
 
+If the tsc_mode is set to "default" the decision to emulate TSC can be
+tweaked further with the "vtsc_tolerance_khz" option.
+
 To understand this in more detail, the rest of this document must
 be read.
 
@@ -211,6 +214,19 @@ is emulated.  Note that, though emulated, the "apparent" 
TSC frequency
 will be the TSC frequency of the initial physical machine, even after
 migration.
 
+Since the calibration of the TSC frequency may not be 100% accurate, the
+exact value of the frequency can change even across reboots. This means
+also several otherwise identical systems can have a slightly different
+TSC frequency. As a result TSC access will be emulated if a domU is
+migrated from one host to another, identical host. To avoid the
+performance impact of TSC emulation a certain tolerance of the measured
+host TSC frequency can be specified with "vtsc_tolerance_khz". If the
+measured "cpu_khz" value is within the tolerance range, TSC access
+remains native. Otherwise it will be emulated. This allows to migrate
+domUs between identical hardware. If the domU will be migrated to a
+different kind of hardware, say from a "2.3GHz" to a "2.5GHz" system,
+TSC will be emualted to maintain the TSC frequency expected by the domU.
+
 For environments where both TSC-safeness AND highest performance
 even across migration is a requirement, application code can be specially
 modified to use an algorithm explicitly designed into Xen for this purpose.
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index a699367779..784a8a9d26 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1891,6 +1891,16 @@ determined in a similar way to that of B<default> TSC 
mode.
 
 Please see B<xen-tscmode(7)> for more information on this option.
 
+=item B<vtsc_tolerance_khz="KHZ">
+
+B<(x86 only, relevant only for tsc_mode=default)>
+When a domU is started, the CPU frequency of the host is used by the domU for
+TSC related time measurement. Once the domU is either migrated or
+saved/restored on another host that CPU frequency has to be emulated to avoid
+timedrift. To avoid the performance penalty of the TSC emulation, allow a
+certain amount of jitter of the measured CPU frequency on the hosts the domU
+is supposed to run on.
+
 =item B<localtime=BOOLEAN>
 
 Set the real time clock to local time or to UTC. False (0) by default,
diff --git a/docs/specs/libxc-migration-stream.pandoc 
b/docs/specs/libxc-migration-stream.pandoc
index 73421ff393..85672d9f05 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -3,7 +3,7 @@
   Andrew Cooper <<andrew.cooper3@xxxxxxxxxx>>
   Wen Congyang <<wency@xxxxxxxxxxxxxx>>
   Yang Hongyang <<hongyang.yang@xxxxxxxxxxxx>>
-% Revision 2
+% Revision 3
 
 Introduction
 ============
@@ -231,7 +231,9 @@ type         0x00000000: END
 
              0x0000000F: CHECKPOINT_DIRTY_PFN_LIST (Secondary -> Primary)
 
-             0x00000010 - 0x7FFFFFFF: Reserved for future _mandatory_
+             0x00000010: VTSC_TOLERANCE_KHZ
+
+             0x00000011 - 0x7FFFFFFF: Reserved for future _mandatory_
              records.
 
              0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
@@ -624,6 +626,23 @@ The count of pfns is: record->length/sizeof(uint64_t).
 
 \clearpage
 
+VTSC_TOLERANCE_KHZ
+-------------------------
+
+Domain TSC information, as accessed by the
+XEN\_DOMCTL\_{get,set}\_vtsc\_khz\_tolerance hypercall sub-ops.
+
+     0     1     2     3     4     5     6     7 octet
+    +------------------------+------------------------+
+    | tolerance              | reserved               |
+    +------------------------+------------------------+
+
+Field            Description
+-----------      ---------------------------------------------------
+tolerance        Amount of Jitter the domU can handle after migration.
+
+\clearpage
+
 Layout
 ======
 
@@ -675,10 +694,14 @@ A typical save record for an x86 HVM guest image would 
look like:
 4. TSC\_INFO
 5. HVM\_PARAMS
 6. HVM\_CONTEXT
+7. VTSC\_TOLERANCE\_KHZ
 
 HVM\_PARAMS must precede HVM\_CONTEXT, as certain parameters can affect
 the validity of architectural state in the context.
 
+VTSC\_TOLERANCE\_KHZ must precede TSC\_INFO because the decision to
+emulate TSC depends on the tolerance value.
+
 
 Legacy Images (x86 only)
 ========================
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 543abfcb34..d0c5442f1d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1355,6 +1355,14 @@ int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
                               int32_t time_offset_seconds);
 
+int xc_domain_set_vtsc_tolerance_khz(xc_interface *xch,
+                                     uint32_t domid,
+                                     uint32_t vtsc_tolerance_khz);
+
+int xc_domain_get_vtsc_tolerance_khz(xc_interface *xch,
+                                     uint32_t domid,
+                                     uint32_t *vtsc_tolerance_khz);
+
 int xc_domain_set_tsc_info(xc_interface *xch,
                            uint32_t domid,
                            uint32_t tsc_mode,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ea3df1ef31..776a81fdc6 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -847,6 +847,31 @@ int xc_domain_disable_migrate(xc_interface *xch, uint32_t 
domid)
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_set_vtsc_tolerance_khz(xc_interface *xch,
+                                     uint32_t domid,
+                                     uint32_t vtsc_tolerance_khz)
+{
+    DECLARE_DOMCTL;
+    domctl.cmd = XEN_DOMCTL_set_vtsc_tolerance_khz;
+    domctl.domain = domid;
+    domctl.u.vtsc_tolerance_khz.tolerance = vtsc_tolerance_khz;
+    return do_domctl(xch, &domctl);
+}
+
+int xc_domain_get_vtsc_tolerance_khz(xc_interface *xch,
+                                     uint32_t domid,
+                                     uint32_t *vtsc_tolerance_khz)
+{
+    int rc;
+    DECLARE_DOMCTL;
+    domctl.cmd = XEN_DOMCTL_get_vtsc_tolerance_khz;
+    domctl.domain = domid;
+    rc = do_domctl(xch, &domctl);
+    if ( rc == 0 )
+        *vtsc_tolerance_khz = domctl.u.vtsc_tolerance_khz.tolerance;
+    return rc;
+}
+
 int xc_domain_set_tsc_info(xc_interface *xch,
                            uint32_t domid,
                            uint32_t tsc_mode,
diff --git a/tools/libxc/xc_sr_common_x86.c b/tools/libxc/xc_sr_common_x86.c
index 98f1cef30f..a39ce68cae 100644
--- a/tools/libxc/xc_sr_common_x86.c
+++ b/tools/libxc/xc_sr_common_x86.c
@@ -1,6 +1,26 @@
 #include "xc_sr_common_x86.h"
 
-int write_tsc_info(struct xc_sr_context *ctx)
+static int write_vtsc_tolerance_khz(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+    struct xc_sr_rec_vtsc_tolerance_khz tsc = { 0 };
+    struct xc_sr_record rec =
+    {
+        .type = REC_TYPE_VTSC_TOLERANCE_KHZ,
+        .length = sizeof(tsc),
+        .data = &tsc
+    };
+
+    if ( xc_domain_get_vtsc_tolerance_khz(xch, ctx->domid, &tsc.tolerance) < 0 
)
+    {
+        PERROR("Unable to obtain vTSC information");
+        return -1;
+    }
+
+    return write_record(ctx, &rec);
+}
+
+static int write_tsc_info(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
     struct xc_sr_rec_tsc_info tsc = { 0 };
@@ -21,6 +41,18 @@ int write_tsc_info(struct xc_sr_context *ctx)
     return write_record(ctx, &rec);
 }
 
+int write_tsc(struct xc_sr_context *ctx)
+{
+    int rc;
+
+    /* Send tolerance before tsc_info */
+    rc = write_vtsc_tolerance_khz(ctx);
+    if ( rc )
+        return rc;
+
+    return write_tsc_info(ctx);
+}
+
 int handle_tsc_info(struct xc_sr_context *ctx, struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
@@ -43,6 +75,28 @@ int handle_tsc_info(struct xc_sr_context *ctx, struct 
xc_sr_record *rec)
     return 0;
 }
 
+int handle_vtsc_tolerance_khz(struct xc_sr_context *ctx, struct xc_sr_record 
*rec)
+{
+    xc_interface *xch = ctx->xch;
+    struct xc_sr_rec_vtsc_tolerance_khz *tsc = rec->data;
+
+    if ( rec->length != sizeof(*tsc) )
+    {
+        ERROR("VTSC_TOLERANCE_KHZ record wrong size: length %u, expected %zu",
+              rec->length, sizeof(*tsc));
+        return -1;
+    }
+
+    if ( xc_domain_set_vtsc_tolerance_khz(xch, ctx->domid, tsc->tolerance) )
+    {
+        PERROR("Unable to set vTSC information");
+        return -1;
+    }
+
+    return 0;
+}
+
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common_x86.h b/tools/libxc/xc_sr_common_x86.h
index 1d42da9af2..c85f328ddc 100644
--- a/tools/libxc/xc_sr_common_x86.h
+++ b/tools/libxc/xc_sr_common_x86.h
@@ -4,10 +4,15 @@
 #include "xc_sr_common.h"
 
 /*
- * Obtains a domains TSC information from Xen and writes a TSC_INFO record
- * into the stream.
+ * Obtains a domains TSC information from Xen and writes a TSC_INFO
+ * and a record VTSC_TOLERANCE_KHZ into the stream.
  */
-int write_tsc_info(struct xc_sr_context *ctx);
+int write_tsc(struct xc_sr_context *ctx);
+
+/*
+ * Parses a VTSC_TOLERANCE_KHZ record and applies the result to the domain.
+ */
+int handle_vtsc_tolerance_khz(struct xc_sr_context *ctx, struct xc_sr_record 
*rec);
 
 /*
  * Parses a TSC_INFO record and applies the result to the domain.
diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c 
b/tools/libxc/xc_sr_restore_x86_hvm.c
index 227c48553e..ea6114d1dd 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -178,6 +178,9 @@ static int x86_hvm_process_record(struct xc_sr_context *ctx,
     case REC_TYPE_HVM_PARAMS:
         return handle_hvm_params(ctx, rec);
 
+    case REC_TYPE_VTSC_TOLERANCE_KHZ:
+        return handle_vtsc_tolerance_khz(ctx, rec);
+
     default:
         return RECORD_NOT_PROCESSED;
     }
diff --git a/tools/libxc/xc_sr_restore_x86_pv.c 
b/tools/libxc/xc_sr_restore_x86_pv.c
index d81dfdcca6..76522ba4fa 100644
--- a/tools/libxc/xc_sr_restore_x86_pv.c
+++ b/tools/libxc/xc_sr_restore_x86_pv.c
@@ -1079,6 +1079,9 @@ static int x86_pv_process_record(struct xc_sr_context 
*ctx,
     case REC_TYPE_TSC_INFO:
         return handle_tsc_info(ctx, rec);
 
+    case REC_TYPE_VTSC_TOLERANCE_KHZ:
+        return handle_vtsc_tolerance_khz(ctx, rec);
+
     default:
         return RECORD_NOT_PROCESSED;
     }
diff --git a/tools/libxc/xc_sr_save_x86_hvm.c b/tools/libxc/xc_sr_save_x86_hvm.c
index 97a8c49807..ebe39338d4 100644
--- a/tools/libxc/xc_sr_save_x86_hvm.c
+++ b/tools/libxc/xc_sr_save_x86_hvm.c
@@ -195,7 +195,7 @@ static int x86_hvm_end_of_checkpoint(struct xc_sr_context 
*ctx)
     int rc;
 
     /* Write the TSC record. */
-    rc = write_tsc_info(ctx);
+    rc = write_tsc(ctx);
     if ( rc )
         return rc;
 
diff --git a/tools/libxc/xc_sr_save_x86_pv.c b/tools/libxc/xc_sr_save_x86_pv.c
index e4b099cfd0..937f0ec57b 100644
--- a/tools/libxc/xc_sr_save_x86_pv.c
+++ b/tools/libxc/xc_sr_save_x86_pv.c
@@ -1104,7 +1104,7 @@ static int x86_pv_end_of_checkpoint(struct xc_sr_context 
*ctx)
 {
     int rc;
 
-    rc = write_tsc_info(ctx);
+    rc = write_tsc(ctx);
     if ( rc )
         return rc;
 
diff --git a/tools/libxc/xc_sr_stream_format.h 
b/tools/libxc/xc_sr_stream_format.h
index 15ff1c7efb..e8249569d1 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -76,6 +76,7 @@ struct xc_sr_rhdr
 #define REC_TYPE_VERIFY                     0x0000000dU
 #define REC_TYPE_CHECKPOINT                 0x0000000eU
 #define REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST  0x0000000fU
+#define REC_TYPE_VTSC_TOLERANCE_KHZ         0x00000010U
 
 #define REC_TYPE_OPTIONAL             0x80000000U
 
@@ -131,6 +132,14 @@ struct xc_sr_rec_hvm_params_entry
     uint64_t value;
 };
 
+/* VTSC_TOLERANCE_KHZ */
+struct xc_sr_rec_vtsc_tolerance_khz
+{
+    uint32_t tolerance;
+    uint32_t _res1;
+};
+
+
 struct xc_sr_rec_hvm_params
 {
     uint32_t count;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index eca0ea2c50..e553f6fe03 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -347,6 +347,12 @@
 #define LIBXL_HAVE_BUILDINFO_BOOTLOADER 1
 #define LIBXL_HAVE_BUILDINFO_BOOTLOADER_ARGS 1
 
+/*
+ * LIBXL_HAVE_VTSC_TOLERANCE_KHZ indicates that libxl_domain_build_info
+ * has the vtsc_tolerance_khz field.
+ */
+#define LIBXL_HAVE_VTSC_TOLERANCE_KHZ 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 35038120ca..f60cfc61d5 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -466,6 +466,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
     ("numa_placement",  libxl_defbool),
     ("tsc_mode",        libxl_tsc_mode),
+    ("vtsc_tolerance_khz", uint32),
     ("max_memkb",       MemKB),
     ("target_memkb",    MemKB),
     ("video_memkb",     MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index d82013f6ed..faf82713ff 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -317,6 +317,8 @@ int libxl__arch_domain_create(libxl__gc *gc, 
libxl_domain_config *d_config,
     default:
         abort();
     }
+    xc_domain_set_vtsc_tolerance_khz(ctx->xch, domid,
+                                     d_config->b_info.vtsc_tolerance_khz);
     xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
     if (libxl_defbool_val(d_config->b_info.disable_migrate))
         xc_domain_disable_migrate(ctx->xch, domid);
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index eaa28793ef..2b086e5758 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -36,6 +36,7 @@ INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
 INSTALL_PRIVBIN                += xenpvnetboot
+INSTALL_PRIVBIN                += xen-vtsc
 
 # Everything to be installed
 TARGETS_ALL := $(INSTALL_BIN) $(INSTALL_SBIN) $(INSTALL_PRIVBIN)
@@ -112,6 +113,9 @@ xen-livepatch: xen-livepatch.o
 xen-diag: xen-diag.o
        $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-vtsc: xen-vtsc.o
+       $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
        $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) 
$(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-vtsc.c b/tools/misc/xen-vtsc.c
new file mode 100644
index 0000000000..be49e7c7b6
--- /dev/null
+++ b/tools/misc/xen-vtsc.c
@@ -0,0 +1,108 @@
+/* Could be merged into xen-diag.c? */
+
+#include <inttypes.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <xenctrl.h>
+
+static uint32_t domid;
+static uint32_t tsc_mode;
+static uint64_t elapsed_nsec;
+static uint32_t gtsc_khz;
+static uint32_t vtsc_tolerance_khz;
+static uint32_t incarnation;
+static uint32_t new_vtsc_tolerance_khz;
+static xc_physinfo_t physinfo;
+
+static void show_help(void)
+{
+    fprintf(stderr, "Usage: xen-vtsc <domid> [vtsc_tolerance]\n");
+}
+
+int main(int argc, char *argv[])
+{
+    struct xc_interface_core *xch;
+    int mode_set = 0;
+    int rc;
+
+    if (argc < 2 || argc > 3 || strcmp("-h", argv[1]) == 0)
+    {
+        show_help();
+        return 0;
+    }
+    domid = atol(argv[1]);
+    if (argc == 3)
+    {
+        unsigned long val;
+
+        val = atol(argv[2]);
+        if ( val > UINT32_MAX )
+        {
+            fprintf(stderr,
+                    "Error: value for vtsc_tolerance must between 0 and %u\n", 
UINT32_MAX);
+            return 1;
+        }
+        new_vtsc_tolerance_khz = val;
+        if ( domid )
+            mode_set = 1;
+    }
+
+    xch = xc_interface_open(0,0,0);
+    if ( !xch )
+    {
+        fprintf(stderr, "failed to get xch handler\n");
+        return 1;
+    }
+
+
+
+    if (mode_set)
+    {
+        rc = xc_domain_set_vtsc_tolerance_khz(xch, domid, 
new_vtsc_tolerance_khz);
+        if ( rc )
+        {
+            perror("xc_domain_set_vtsc_tolerance_khz");
+            goto err;
+        }
+    }
+    else
+    {
+        rc =  xc_physinfo(xch, &physinfo);
+        if ( rc )
+        {
+            perror("xc_physinfo");
+            goto err;
+        }
+
+        rc = xc_domain_get_vtsc_tolerance_khz(xch, domid, &vtsc_tolerance_khz);
+        if ( rc )
+        {
+            perror("xc_domain_get_vtsc_tolerance_khz");
+            goto err;
+        }
+        rc = xc_domain_get_tsc_info(xch, domid, &tsc_mode, &elapsed_nsec,
+                                    &gtsc_khz, &incarnation);
+        if ( rc )
+        {
+            perror("xc_domain_get_tsc_info");
+            goto err;
+        }
+
+        printf("domid: %" PRIu32 "\n"
+               "tsc_mode: %" PRIu32 "\n"
+               "elapsed_nsec: %" PRIu64 "\n"
+               "gtsc_khz: %" PRIu32 "\n"
+               "incarnation: %" PRIu32 "\n"
+               "vtsc_tolerance_khz: %" PRIu32 "\n"
+               "cpu_khz: %" PRIu32 "\n",
+               domid, tsc_mode, elapsed_nsec, gtsc_khz, incarnation,
+               vtsc_tolerance_khz, physinfo.cpu_khz);
+    }
+
+err:
+    xc_interface_close(xch);
+
+    return !!rc;
+}
diff --git a/tools/python/xen/migration/libxc.py 
b/tools/python/xen/migration/libxc.py
index f24448a9ef..459732c2f7 100644
--- a/tools/python/xen/migration/libxc.py
+++ b/tools/python/xen/migration/libxc.py
@@ -65,6 +65,7 @@ REC_TYPE_x86_pv_vcpu_msrs           = 0x0000000c
 REC_TYPE_verify                     = 0x0000000d
 REC_TYPE_checkpoint                 = 0x0000000e
 REC_TYPE_checkpoint_dirty_pfn_list  = 0x0000000f
+REC_TYPE_vtsc_tolerance_khz         = 0x00000010
 
 rec_type_to_str = {
     REC_TYPE_end                        : "End",
@@ -83,6 +84,7 @@ rec_type_to_str = {
     REC_TYPE_verify                     : "Verify",
     REC_TYPE_checkpoint                 : "Checkpoint",
     REC_TYPE_checkpoint_dirty_pfn_list  : "Checkpoint dirty pfn list"
+    REC_TYPE_vtsc_tolerance_khz         : "vTSC tolerance"
 }
 
 # page_data
@@ -120,6 +122,9 @@ TSC_INFO_FORMAT           = "IIQII"
 HVM_PARAMS_ENTRY_FORMAT   = "QQ"
 HVM_PARAMS_FORMAT         = "II"
 
+# vtsc_tolerance_khz
+VTSC_TOLERANCE_KHZ_FORMAT = "II"
+
 class VerifyLibxc(VerifyBase):
     """ Verify a Libxc v2 stream """
 
@@ -424,6 +429,23 @@ class VerifyLibxc(VerifyBase):
         """ checkpoint dirty pfn list """
         raise RecordError("Found checkpoint dirty pfn list record in stream")
 
+    def verify_record_vtsc_tolerance_khz(self, content):
+        """ tsc info record """
+
+        sz = calcsize(VTSC_TOLERANCE_KHZ_FORMAT)
+
+        if len(content) != sz:
+            raise RecordError("Length should be %u bytes" % (sz, ))
+
+        tolerance, res1 = unpack(VTSC_TOLERANCE_KHZ_FORMAT, content)
+
+        if res1 != 0:
+            raise StreamError("Reserved bits set in VTSC_TOLERANCE_KHZ: 0x%08x"
+                              % (res1, ))
+
+        self.info("  Tolerance %u" % (tolerance))
+
+
 
 record_verifiers = {
     REC_TYPE_end:
@@ -466,4 +488,6 @@ record_verifiers = {
         VerifyLibxc.verify_record_checkpoint,
     REC_TYPE_checkpoint_dirty_pfn_list:
         VerifyLibxc.verify_record_checkpoint_dirty_pfn_list,
+    REC_TYPE_vtsc_tolerance_khz:
+        VerifyLibxc.verify_record_vtsc_tolerance_khz,
     }
diff --git a/tools/python/xen/migration/tests.py 
b/tools/python/xen/migration/tests.py
index 026cf38012..1eb7c4aba5 100644
--- a/tools/python/xen/migration/tests.py
+++ b/tools/python/xen/migration/tests.py
@@ -26,6 +26,7 @@ class TestLibxc(unittest.TestCase):
                          (libxc.TSC_INFO_FORMAT, 24),
                          (libxc.HVM_PARAMS_ENTRY_FORMAT, 16),
                          (libxc.HVM_PARAMS_FORMAT, 8),
+                         (libxc.VTSC_TOLERANCE_KHZ_FORMAT, 24),
                          ):
             self.assertEqual(calcsize(fmt), sz)
 
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index f6842540ca..6378dd4b3e 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1126,6 +1126,9 @@ void parse_config_data(const char *config_source,
         }
     }
 
+    if (!xlu_cfg_get_long(config, "vtsc_tolerance_khz", &l, 0))
+        b_info->vtsc_tolerance_khz = l;
+
     if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
         b_info->rtc_timeoffset = l;
 
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..96b6a0d1ee 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -959,6 +959,15 @@ long arch_do_domctl(
         }
         break;
 
+    case XEN_DOMCTL_get_vtsc_tolerance_khz:
+        domctl->u.vtsc_tolerance_khz.tolerance = d->arch.vtsc_tolerance_khz;
+        copyback = true;
+        break;
+
+    case XEN_DOMCTL_set_vtsc_tolerance_khz:
+        d->arch.vtsc_tolerance_khz = domctl->u.vtsc_tolerance_khz.tolerance;
+        break;
+
     case XEN_DOMCTL_suppress_spurious_page_faults:
         d->arch.suppress_spurious_page_faults = 1;
         break;
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 1a6fde65dd..f00838949d 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2132,6 +2132,7 @@ void tsc_set_info(struct domain *d,
 
     switch ( d->arch.tsc_mode = tsc_mode )
     {
+        bool disable_vtsc;
         bool enable_tsc_scaling;
 
     case TSC_MODE_DEFAULT:
@@ -2147,8 +2148,26 @@ void tsc_set_info(struct domain *d,
          * When a guest is created, gtsc_khz is passed in as zero, making
          * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
          */
+        disable_vtsc = d->arch.tsc_khz == cpu_khz;
+
+        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz &&
+             d->arch.vtsc_tolerance_khz )
+        {
+            uint32_t khz_diff;
+
+            khz_diff = cpu_khz > gtsc_khz ?
+                       cpu_khz - gtsc_khz : gtsc_khz - cpu_khz;
+            disable_vtsc = khz_diff <= d->arch.vtsc_tolerance_khz;
+
+            printk(XENLOG_G_INFO "%s: d%u: host has %lu kHz,"
+                   " domU expects %u kHz,"
+                   " difference of %u is %s tolerance of %u\n",
+                   __func__, d->domain_id, cpu_khz, gtsc_khz, khz_diff,
+                   disable_vtsc ? "within" : "outside",
+                   d->arch.vtsc_tolerance_khz);
+        }
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
-             (d->arch.tsc_khz == cpu_khz ||
+             (disable_vtsc ||
               (is_hvm_domain(d) &&
                hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
         {
@@ -2237,6 +2256,8 @@ static void dump_softtsc(unsigned char key)
             printk(",ofs=%#"PRIx64, d->arch.vtsc_offset);
         if ( d->arch.tsc_khz )
             printk(",khz=%"PRIu32, d->arch.tsc_khz);
+        if ( d->arch.vtsc_tolerance_khz )
+            printk(",tol=%"PRIu16, d->arch.vtsc_tolerance_khz);
         if ( d->arch.incarnation )
             printk(",inc=%"PRIu32, d->arch.incarnation);
 #if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 47aadc2600..854a6ecabd 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -374,6 +374,7 @@ struct arch_domain
     uint64_t vtsc_offset;    /* adjustment for save/restore/migrate */
     uint32_t tsc_khz;        /* cached guest khz for certain emulated or
                                 hardware TSC scaling cases */
+    uint32_t vtsc_tolerance_khz; /* domU handles that much jitter in khz value 
*/
     struct time_scale vtsc_to_ns; /* scaling for certain emulated or
                                      hardware TSC scaling cases */
     struct time_scale ns_to_vtsc; /* scaling for certain emulated or
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index ec7a860afc..fe325ffe67 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -706,6 +706,13 @@ struct xen_domctl_tsc_info {
     uint64_aligned_t elapsed_nsec;
 };
 
+/* XEN_DOMCTL_get_vtsc_tolerance_khz */
+/* XEN_DOMCTL_set_vtsc_tolerance_khz */
+struct xen_domctl_vtsc_tolerance_khz {
+    /* IN/OUT */
+    uint32_t tolerance;
+};
+
 /* XEN_DOMCTL_gdbsx_guestmemio      guest mem io */
 struct xen_domctl_gdbsx_memio {
     /* IN */
@@ -1172,6 +1179,8 @@ struct xen_domctl {
 #define XEN_DOMCTL_soft_reset                    79
 #define XEN_DOMCTL_set_gnttab_limits             80
 #define XEN_DOMCTL_vuart_op                      81
+#define XEN_DOMCTL_get_vtsc_tolerance_khz        82
+#define XEN_DOMCTL_set_vtsc_tolerance_khz        83
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1199,6 +1208,7 @@ struct xen_domctl {
         struct xen_domctl_settimeoffset     settimeoffset;
         struct xen_domctl_disable_migrate   disable_migrate;
         struct xen_domctl_tsc_info          tsc_info;
+        struct xen_domctl_vtsc_tolerance_khz vtsc_tolerance_khz;
         struct xen_domctl_hvmcontext        hvmcontext;
         struct xen_domctl_hvmcontext_partial hvmcontext_partial;
         struct xen_domctl_address_size      address_size;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 1802d8dfe6..f01d1ce307 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -718,9 +718,11 @@ static int flask_domctl(struct domain *d, int cmd)
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_CPUID);
 
     case XEN_DOMCTL_gettscinfo:
+    case XEN_DOMCTL_get_vtsc_tolerance_khz:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__GETTSC);
 
     case XEN_DOMCTL_settscinfo:
+    case XEN_DOMCTL_set_vtsc_tolerance_khz:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SETTSC);
 
     case XEN_DOMCTL_audit_p2m:

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.