[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v3 2/6] x86/time: implement tsc as clocksource



Recent x86/time changes improved a lot of the monotonicity in xen timekeeping,
making it much harder to observe time going backwards.  Although platform timer
can't be expected to be perfectly in sync with TSC and so get_s_time won't be
guaranteed to always return monotonically increasing values across cpus. This
is the case in some of the boxes I am testing with, observing sometimes ~100
warps (of very few nanoseconds each) after a few hours.

This patch introduces support for using TSC as platform time source
which is the highest resolution time and most performant to get (~20
nsecs). Though there are also several problems associated with its
usage, and there isn't a complete (and architecturally defined)
guarantee that all machines will provide reliable and monotonic TSC in
all cases (I believe Intel to be the only that can guarantee that?) For
this reason it's set with less priority when compared to HPET unless
adminstrator changes "clocksource" boot option to "tsc". Initializing
TSC clocksource requires all CPUs up to have the tsc reliability checks
performed. init_xen_time is called before all CPUs are up, and so we
would start with HPET at boot time, and switch later to TSC. The switch
then happens on verify_tsc_reliability initcall that is invoked when all
CPUs are up. When attempting to initialize TSC we also check for time
warps and if it has invariant TSC. And in case none of these conditions
are met, we keep the clocksource that was previously initialized on
init_xen_time. Note that while we deem reliable a CONSTANT_TSC with no
deep C-states, it might not always be the case, so we're conservative
and allow TSC to be used as platform timer only with invariant TSC.

Since b64438c7c ("x86/time: use correct (local) time stamp in
constant-TSC calibration fast path") updates to cpu time use local
stamps, which means platform timer is only used to seed the initial cpu
time. With clocksource=tsc there is no need to be in sync with another
clocksource, so we reseed the local/master stamps to be values of TSC
and update the platform time stamps accordingly. Time calibration is set
to 1sec after we switch to TSC, thus these stamps are reseeded to also
ensure monotonic returning values right after the point we switch to
TSC. This is also to avoid the possibility of having inconsistent
readings in this short period (i.e. until calibration fires).

Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
---
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Changes since v2:
 - Suggest "HPET switching to TSC" only as an example as otherwise it
 would be misleading on platforms not having one.
 - Change init_tsctimer to skip all the tests and assume it's called
 only on reliable TSC conditions and no warps observed. Tidy
 initialization on verify_tsc_reliability as suggested by Konrad.
 - CONSTANT_TSC and max_cstate <= 2 case removed and only allow tsc
   clocksource in invariant TSC boxes.
 - Prefer omit !=0 on init_platform_timer for tsc case.
 - Change comment on init_platform_timer.
 - Add comment on plt_tsc declaration.
 - Reinit CPU time for all online cpus instead of just CPU 0.
 - Use rdtsc_ordered() as opposed to rdtsc()
 - Remove tsc_freq variable and set plt_tsc clocksource frequency
 with the refined tsc calibration.
 - Rework a bit the commit message.

Changes since v1:
 - s/printk/printk(XENLOG_INFO
 - Remove extra space on inner brackets
 - Add missing space around brackets
 - Defer TSC initialization when all CPUs are up.

Changes since RFC:
 - Spelling fixes in the commit message.
 - Remove unused clocksource_is_tsc variable and introduce it instead
 on the patch that uses it.
 - Move plt_tsc from second to last in the available clocksources.
---
 xen/arch/x86/time.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 6750e46..b2a11a8 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -475,6 +475,30 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 }
 
 /************************************************************
+ * PLATFORM TIMER 4: TSC
+ */
+
+static s64 __init init_tsctimer(struct platform_timesource *pts)
+{
+    return pts->frequency;
+}
+
+static u64 read_tsc(void)
+{
+    return rdtsc_ordered();
+}
+
+static struct platform_timesource __initdata plt_tsc =
+{
+    .id = "tsc",
+    .name = "TSC",
+    .read_counter = read_tsc,
+    .counter_bits = 64,
+    /* Not called by init_platform_timer as it is not on the plt_timers array 
*/
+    .init = init_tsctimer,
+};
+
+/************************************************************
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
 
@@ -576,6 +600,21 @@ static void resume_platform_timer(void)
     plt_stamp = plt_src.read_counter();
 }
 
+static void __init reset_platform_timer(void)
+{
+    /* Deactivate any timers running */
+    kill_timer(&plt_overflow_timer);
+    kill_timer(&calibration_timer);
+
+    /* Reset counters and stamps */
+    spin_lock_irq(&platform_timer_lock);
+    plt_stamp = 0;
+    plt_stamp64 = 0;
+    platform_timer_stamp = 0;
+    stime_platform_stamp = 0;
+    spin_unlock_irq(&platform_timer_lock);
+}
+
 static s64 __init try_platform_timer(struct platform_timesource *pts)
 {
     s64 rc = pts->init(pts);
@@ -583,6 +622,10 @@ static s64 __init try_platform_timer(struct 
platform_timesource *pts)
     if ( rc <= 0 )
         return rc;
 
+    /* We have a platform timesource already so reset it */
+    if ( plt_src.counter_bits != 0 )
+        reset_platform_timer();
+
     plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
 
     set_time_scale(&plt_scale, pts->frequency);
@@ -604,7 +647,9 @@ static u64 __init init_platform_timer(void)
     unsigned int i;
     s64 rc = -1;
 
-    if ( opt_clocksource[0] != '\0' )
+    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
+    if ( (opt_clocksource[0] != '\0') &&
+         (strcmp(opt_clocksource, "tsc")) )
     {
         for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
         {
@@ -1481,6 +1526,40 @@ static int __init verify_tsc_reliability(void)
                    __func__);
             setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
         }
+        else if ( !strcmp(opt_clocksource, "tsc") )
+        {
+            int cpu;
+
+            if ( try_platform_timer(&plt_tsc) <= 0 )
+                return 0;
+
+            /*
+             * Platform timer has changed and CPU time will only be updated
+             * after we set again the calibration timer, which means we need to
+             * seed again each local CPU time. At this stage TSC is known to be
+             * reliable i.e. monotonically increasing across all CPUs so this
+             * lets us remove the skew between platform timer and TSC, since
+             * these are now effectively the same.
+             */
+            for_each_online_cpu( cpu )
+            {
+                struct cpu_time *t = &per_cpu(cpu_time, cpu);
+
+                t->stamp.local_tsc = boot_tsc_stamp;
+                t->stamp.local_stime = 0;
+                t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
+                t->stamp.master_stime = t->stamp.local_stime;
+            }
+
+            platform_timer_stamp = plt_stamp64;
+            stime_platform_stamp = get_s_time_fixed(plt_stamp64);
+
+            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
+                   freq_string(plt_src.frequency));
+
+            init_timer(&calibration_timer, time_calibration, NULL, 0);
+            set_timer(&calibration_timer, NOW() + EPOCH);
+        }
     }
 
     return 0;
@@ -1528,6 +1607,7 @@ void __init early_time_init(void)
 
     preinit_pit();
     tmp = init_platform_timer();
+    plt_tsc.frequency = tmp;
 
     set_time_scale(&t->tsc_scale, tmp);
     t->stamp.local_tsc = boot_tsc_stamp;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.