[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

Improve decision when vTSC emulation will be activated for a domU with
tsc_mode=default. The current approach is to compare the cpu_khz value
from two physical hosts. Since this value is not accurate, it can not be
used verbatim to decide if vTSC emulation needs to be enabled. Without
this change each TSC access from domU will be emulated after migration,
which causes a significant perfomance drop for workloads that make use
of rdtsc.

If a domU uses TSC as clocksoure it also must run NTP in some way to
avoid the potential drift what will most likely happen, independent of
any migration. The calculation of the drift is based on the time
returned by remote servers versus how fast the local clock advances. NTP
can handle a drift up to 500PPM. This means the local clocksource can
run up to 500us slower or faster. This calculation is based on the TSC
frequency of the host where the domU was started. Once a domU is
migrated to a host of a different class, like from "2.3GHz" to "2.4GHz",
the TSC frequency changes, but the domU kernel may not recalibrate
itself. As a result, the drift will be larger and might be outside of
the 500 PPM range. In addition, the kernel may notice the change of
speed in which the TSC advances and could change the clocksource. All
this depends of course on the type of OS that is running in the domU.

If the domU is migrated to another host of the same class, both hosts
may have a slightly different TSC frequency. The difference is small
enough and most likely within the drift range that NTP can handle.

The formula to set the tolerance for this host calculates the ticks
within a timespan of 500 PPM, which is 500us. From this number the
assumed jitter in the TSC frequency measurement must be substracted
because Xen itself can not know if the estimated value in cpu_khz is at
the edge or in the middle of the range of possible freqencies. Data
collected during the incident which triggered this change showed a
jitter of up to 200 KHz across systems of the same class. The resulting
tolerance is larger than needed, and it is expected to still cover the
possible drift that NTP can handle.

Signed-off-by: Olaf Hering <olaf@xxxxxxxxx>

 - trim patch and use calculated tolerance value, no admin interaction
 - rebase to ae01a8e315
 - remove changes for libxl and save/restore protocol, the feature has
   to be per host instead of per guest
 - add newline to tsc_set_info (Andrew)
 - add pointer to xen-tscmode(7) in xl.cfg(5)/vtsc_tolerance_khz (Andrew)
 - mention potential clock drift in the domU (Andrew)
 - reword the newly added paragraph in xen-tscmode(7) (Andrew),
   and also mention that it is about the measured/estimated TSC value
   rather than the real value. The latter is simply unknown.
 - use uint32 for internal representation of 
   and remove padding field
 - add math for real TSC frequency to xen-tscmode
 - extend commit msg, mention potential issues with xc_sr_rec_tsc_info._res1
 - adjust also python stream checker for added tolerance member
 - use uint16 in libxl_types.idl to match type used elsewhere in the patch
 - mention default value in xl.cfg
 - tsc_set_info: remove usage of __func__, use %d for domid
 - tsc_set_info: use ABS to calculate khz_diff
 - reduce functionality to allow setting of the tolerance value
   only at initial domU startup
 - add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz
 - rename vtsc_khz_tolerance to vtsc_tolerance_khz
 - separate domctls to adjust values
 - more docs
 - update libxl.h
 - update python tests
 - flask check bound to tsc permissions
 - not runtime tested due to dlsym() build errors in staging
 xen/arch/x86/time.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 24d4c2794b..2ffdc2ea8f 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -43,6 +43,9 @@ static char __initdata opt_clocksource[10];
 string_param("clocksource", opt_clocksource);
 unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
+#define VTSC_NTP_PPM_TOLERANCE 500UL  /* Amount of drift NTP will handle */
+#define VTSC_JITTER_RANGE_KHZ 200UL   /* Assumed jitter in cpu_khz */
+static unsigned int __read_mostly vtsc_tolerance_khz;
 unsigned long pit0_ticks;
@@ -1885,6 +1888,16 @@ void __init early_time_init(void)
     printk("Detected %lu.%03lu MHz processor.\n", 
            cpu_khz / 1000, cpu_khz % 1000);
+    tmp = 1000 * 1000;
+    tmp *= cpu_khz;
+    tmp /= 1000 * 1000;
+    tmp -= cpu_khz;
+    if (tmp >= VTSC_JITTER_RANGE_KHZ)
+        tmp -= VTSC_JITTER_RANGE_KHZ;
+    vtsc_tolerance_khz = (unsigned int)tmp;
+    printk("Tolerating vtsc jitter for domUs: %u kHz.\n", vtsc_tolerance_khz);
     setup_irq(0, 0, &irq0);
@@ -2208,6 +2221,7 @@ void tsc_set_info(struct domain *d,
     switch ( d->arch.tsc_mode = tsc_mode )
+        bool disable_vtsc;
         bool enable_tsc_scaling;
@@ -2223,8 +2237,25 @@ void tsc_set_info(struct domain *d,
          * When a guest is created, gtsc_khz is passed in as zero, making
          * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
+        disable_vtsc = d->arch.tsc_khz == cpu_khz;
+        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz && vtsc_tolerance_khz )
+        {
+            long khz_diff;
+            khz_diff = ABS((long)(cpu_khz - gtsc_khz));
+            disable_vtsc = khz_diff <= vtsc_tolerance_khz;
+            printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
+                   " domU expects %u kHz,"
+                   " difference of %ld is %s tolerance of %u\n",
+                   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
+                   disable_vtsc ? "within" : "outside",
+                   vtsc_tolerance_khz);
+        }
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
-             (d->arch.tsc_khz == cpu_khz ||
+             (disable_vtsc ||
               (is_hvm_domain(d) &&
                hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.