[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen master] x86/time: implement tsc as clocksource



commit 15c3ef8b5e177d208ed24ccc7ea0d032aa25b16d
Author:     Joao Martins <joao.m.martins@xxxxxxxxxx>
AuthorDate: Fri Sep 23 18:25:19 2016 +0200
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Fri Sep 23 18:25:19 2016 +0200

    x86/time: implement tsc as clocksource
    
    Recent x86/time changes improved a lot of the monotonicity in xen
    timekeeping, making it much harder to observe time going backwards.
    Although platform timer can't be expected to be perfectly in sync with
    TSC and so get_s_time won't be guaranteed to always return
    monotonically increasing values across cpus. This is the case in some
    of the boxes I am testing with, observing sometimes ~100 warps (of
    very few nanoseconds each) after a few hours.
    
    This patch introduces support for using TSC as platform time source
    which is the highest resolution time and most performant to get.
    Though there are also several problems associated with its usage, and
    there isn't a complete (and architecturally defined) guarantee that
    all machines will provide reliable and monotonic TSC in all cases (I
    believe Intel to be the only that can guarantee that?). For this reason
    it's not used unless administrator changes "clocksource" boot option
    to "tsc". Initializing TSC clocksource requires all CPUs up to have
    the tsc reliability checks performed. init_xen_time is called before
    all CPUs are up, so for example we would start with HPET (or ACPI,
    PIT) at boot time, and switch later to TSC. The switch then happens on
    verify_tsc_reliability initcall that is invoked when all CPUs are up.
    When attempting to initialize TSC we also check for time warps and if
    it has invariant TSC. Note that while we deem reliable a CONSTANT_TSC
    with no deep C-states, it might not always be the case, so we're
    conservative and allow TSC to be used as platform timer only with
    invariant TSC. Additionally we check if CPU Hotplug isn't meant to be
    performed on the host which will either be when max vcpus and
    num_present_cpu are the same. This is because a newly hotplugged CPU
    may not satisfy the condition of having all TSCs synchronized - so
    when having tsc clocksource being used we allow offlining CPUs but not
    onlining any ones back. Finally we prevent TSC from being used as
    clocksource on multiple sockets because it isn't guaranteed to be
    invariant. Further relaxing of this last requirement is added in a
    separate patch, such that we allow vendors with such guarantee to use
    TSC as clocksource. In case any of these conditions is not met, we
    keep the clocksource that was previously initialized on init_xen_time.
    
    Since b64438c7c ("x86/time: use correct (local) time stamp in
    constant-TSC calibration fast path") updates to cpu time use local
    stamps, which means platform timer is only used to seed the initial
    cpu time. We further introduce a new rendezvous function
    (nop_rendezvous) which doesn't require synchronization between master
    and slave CPUS and just reads calibration_rendezvous struct and writes
    it down the stime and stamp to the cpu_calibration struct to be used
    later on. With clocksource=tsc there is no need to be in sync with
    another clocksource, so we reseed the local/master stamps to be values
    of TSC and update the platform time stamps accordingly. Time
    calibration is set to 1sec after we switch to TSC, thus these stamps
    are reseeded to also ensure monotonic returning values right after the
    point we switch to TSC. This is to remove the possibility of having
    inconsistent readings in this short period (i.e. until calibration
    fires).
    
    Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
    Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
---
 docs/misc/xen-command-line.markdown |   6 +-
 xen/arch/x86/platform_hypercall.c   |   3 +-
 xen/arch/x86/time.c                 | 161 +++++++++++++++++++++++++++++++++---
 xen/include/asm-x86/time.h          |   1 +
 4 files changed, 159 insertions(+), 12 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 3a250cb..07ecd5e 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -264,9 +264,13 @@ minimum of 32M, subject to a suitably aligned and sized 
contiguous
 region of memory being available.
 
 ### clocksource
-> `= pit | hpet | acpi`
+> `= pit | hpet | acpi | tsc`
 
 If set, override Xen's default choice for the platform timer.
+Having TSC as platform timer requires being explicitly set. This is because
+TSC can only be safely used if CPU hotplug isn't performed on the system. On
+some platforms, the "maxcpus" option may need to be used to further adjust
+the number of allowed CPUs.
 
 ### cmci-threshold
 > `= <integer>`
diff --git a/xen/arch/x86/platform_hypercall.c 
b/xen/arch/x86/platform_hypercall.c
index 780f22d..0879e19 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -631,7 +631,8 @@ ret_t 
do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
         if ( ret )
             break;
 
-        if ( cpu >= nr_cpu_ids || !cpu_present(cpu) )
+        if ( cpu >= nr_cpu_ids || !cpu_present(cpu) ||
+             clocksource_is_tsc() )
         {
             ret = -EINVAL;
             break;
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 6305a84..128e653 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -475,6 +475,55 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 }
 
 /************************************************************
+ * PLATFORM TIMER 4: TSC
+ */
+
+/*
+ * Called in verify_tsc_reliability() under reliable TSC conditions
+ * thus reusing all the checks already performed there.
+ */
+static s64 __init init_tsc(struct platform_timesource *pts)
+{
+    u64 ret = pts->frequency;
+
+    if ( nr_cpu_ids != num_present_cpus() )
+    {
+        printk(XENLOG_WARNING "TSC: CPU Hotplug intended\n");
+        ret = 0;
+    }
+
+    if ( nr_sockets > 1 )
+    {
+        printk(XENLOG_WARNING "TSC: Not invariant across sockets\n");
+        ret = 0;
+    }
+
+    if ( !ret )
+        printk(XENLOG_DEBUG "TSC: Not setting it as clocksource\n");
+
+    return ret;
+}
+
+static u64 read_tsc(void)
+{
+    return rdtsc_ordered();
+}
+
+static struct platform_timesource __initdata plt_tsc =
+{
+    .id = "tsc",
+    .name = "TSC",
+    .read_counter = read_tsc,
+    /*
+     * Calculations for platform timer overflow assume u64 boundary.
+     * Hence we set to less than 64, such that the TSC wraparound is
+     * correctly checked and handled.
+     */
+    .counter_bits = 63,
+    .init = init_tsc,
+};
+
+/************************************************************
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
 
@@ -580,6 +629,21 @@ static void resume_platform_timer(void)
     plt_stamp = plt_src.read_counter();
 }
 
+static void __init reset_platform_timer(void)
+{
+    /* Deactivate any timers running */
+    kill_timer(&plt_overflow_timer);
+    kill_timer(&calibration_timer);
+
+    /* Reset counters and stamps */
+    spin_lock_irq(&platform_timer_lock);
+    plt_stamp = 0;
+    plt_stamp64 = 0;
+    platform_timer_stamp = 0;
+    stime_platform_stamp = 0;
+    spin_unlock_irq(&platform_timer_lock);
+}
+
 static s64 __init try_platform_timer(struct platform_timesource *pts)
 {
     s64 rc = pts->init(pts);
@@ -587,6 +651,10 @@ static s64 __init try_platform_timer(struct 
platform_timesource *pts)
     if ( rc <= 0 )
         return rc;
 
+    /* We have a platform timesource already so reset it */
+    if ( plt_src.counter_bits != 0 )
+        reset_platform_timer();
+
     plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
 
     set_time_scale(&plt_scale, pts->frequency);
@@ -608,7 +676,8 @@ static u64 __init init_platform_timer(void)
     unsigned int i;
     s64 rc = -1;
 
-    if ( opt_clocksource[0] != '\0' )
+    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
+    if ( (opt_clocksource[0] != '\0') && strcmp(opt_clocksource, "tsc") )
     {
         for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
         {
@@ -1344,6 +1413,22 @@ static void time_calibration_std_rendezvous(void *_r)
     time_calibration_rendezvous_tail(r);
 }
 
+/*
+ * Rendezvous function used when clocksource is TSC and
+ * no CPU hotplug will be performed.
+ */
+static void time_calibration_nop_rendezvous(void *rv)
+{
+    const struct calibration_rendezvous *r = rv;
+    struct cpu_time_stamp *c = &this_cpu(cpu_calibration);
+
+    c->local_tsc    = r->master_tsc_stamp;
+    c->local_stime  = r->master_stime;
+    c->master_stime = r->master_stime;
+
+    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
+}
+
 static void (*time_calibration_rendezvous_fn)(void *) =
     time_calibration_std_rendezvous;
 
@@ -1353,6 +1438,13 @@ static void time_calibration(void *unused)
         .semaphore = ATOMIC_INIT(0)
     };
 
+    if ( clocksource_is_tsc() )
+    {
+        local_irq_disable();
+        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
+        local_irq_enable();
+    }
+
     cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
 
     /* @wait=1 because we must wait for all cpus before freeing @r. */
@@ -1467,6 +1559,31 @@ static void __init tsc_check_writability(void)
     disable_tsc_sync = 1;
 }
 
+static void __init reset_percpu_time(void *unused)
+{
+    struct cpu_time *t = &this_cpu(cpu_time);
+
+    t->stamp.local_tsc = boot_tsc_stamp;
+    t->stamp.local_stime = 0;
+    t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
+    t->stamp.master_stime = t->stamp.local_stime;
+}
+
+static void __init try_platform_timer_tail(bool late)
+{
+    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
+    plt_overflow(NULL);
+
+    platform_timer_stamp = plt_stamp64;
+    stime_platform_stamp = NOW();
+
+    if ( !late )
+        init_percpu_time();
+
+    init_timer(&calibration_timer, time_calibration, NULL, 0);
+    set_timer(&calibration_timer, NOW() + EPOCH);
+}
+
 /* Late init function, after all cpus have booted */
 static int __init verify_tsc_reliability(void)
 {
@@ -1484,6 +1601,32 @@ static int __init verify_tsc_reliability(void)
             printk("TSC warp detected, disabling TSC_RELIABLE\n");
             setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
         }
+        else if ( !strcmp(opt_clocksource, "tsc") &&
+                  (try_platform_timer(&plt_tsc) > 0) )
+        {
+            /*
+             * Platform timer has changed and CPU time will only be updated
+             * after we set again the calibration timer, which means we need to
+             * seed again each local CPU time. At this stage TSC is known to be
+             * reliable i.e. monotonically increasing across all CPUs so this
+             * lets us remove the skew between platform timer and TSC, since
+             * these are now effectively the same.
+             */
+            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
+
+            /*
+             * We won't do CPU Hotplug and TSC clocksource is being used which
+             * means we have a reliable TSC, plus we don't sync with any other
+             * clocksource so no need for rendezvous.
+             */
+            time_calibration_rendezvous_fn = time_calibration_nop_rendezvous;
+
+            /* Finish platform timer switch. */
+            try_platform_timer_tail(true);
+
+            printk("Switched to Platform timer %s TSC\n",
+                   freq_string(plt_src.frequency));
+        }
     }
 
     return 0;
@@ -1509,15 +1652,7 @@ int __init init_xen_time(void)
     do_settime(get_cmos_time(), 0, NOW());
 
     /* Finish platform timer initialization. */
-    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
-    plt_overflow(NULL);
-    platform_timer_stamp = plt_stamp64;
-    stime_platform_stamp = NOW();
-
-    init_percpu_time();
-
-    init_timer(&calibration_timer, time_calibration, NULL, 0);
-    set_timer(&calibration_timer, NOW() + EPOCH);
+    try_platform_timer_tail(false);
 
     return 0;
 }
@@ -1531,6 +1666,7 @@ void __init early_time_init(void)
 
     preinit_pit();
     tmp = init_platform_timer();
+    plt_tsc.frequency = tmp;
 
     set_time_scale(&t->tsc_scale, tmp);
     t->stamp.local_tsc = boot_tsc_stamp;
@@ -1779,6 +1915,11 @@ void pv_soft_rdtsc(struct vcpu *v, struct cpu_user_regs 
*regs, int rdtscp)
              (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) ? d->arch.incarnation : 0;
 }
 
+bool clocksource_is_tsc(void)
+{
+    return plt_src.read_counter == read_tsc;
+}
+
 int host_tsc_is_safe(void)
 {
     return boot_cpu_has(X86_FEATURE_TSC_RELIABLE);
diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h
index 971883a..6d704b4 100644
--- a/xen/include/asm-x86/time.h
+++ b/xen/include/asm-x86/time.h
@@ -69,6 +69,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode, 
uint64_t *elapsed_nsec,
 
 void force_update_vcpu_system_time(struct vcpu *v);
 
+bool clocksource_is_tsc(void);
 int host_tsc_is_safe(void);
 void cpuid_time_leaf(uint32_t sub_idx, uint32_t *eax, uint32_t *ebx,
                      uint32_t *ecx, uint32_t *edx);
--
generated by git-patchbot for /home/xen/git/xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxx
https://lists.xenproject.org/xen-changelog

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.