[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xen master] x86/time: latch to-be-written TSC value early in rendezvous loop



commit 6ca022a1e16a3ad46ddb3f5fff48b69a434b1c21
Author:     Jan Beulich <jbeulich@xxxxxxxx>
AuthorDate: Thu Apr 22 13:25:53 2021 +0200
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Thu Apr 22 13:25:53 2021 +0200

    x86/time: latch to-be-written TSC value early in rendezvous loop
    
    To reduce latency on time_calibration_tsc_rendezvous()'s last loop
    iteration, read the value to be written on the last iteration at the end
    of the loop body (i.e. in particular at the end of the second to last
    iteration).
    
    On my single-socket 18-core Skylake system this reduces the average loop
    exit time on CPU0 (from the TSC write on the last iteration to until
    after the main loop) from around 32k cycles to around 29k (albeit the
    values measured on separate runs vary quite significantly).
    
    Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
    Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
 xen/arch/x86/time.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 6bc1fd11d6..e92365f5bd 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1683,7 +1683,7 @@ static void time_calibration_tsc_rendezvous(void *_r)
     int i;
     struct calibration_rendezvous *r = _r;
     unsigned int total_cpus = cpumask_weight(&r->cpu_calibration_map);
-    uint64_t tsc = 0;
+    uint64_t tsc = 0, master_tsc = 0;
 
     /* Loop to get rid of cache effects on TSC skew. */
     for ( i = 4; i >= 0; i-- )
@@ -1708,7 +1708,7 @@ static void time_calibration_tsc_rendezvous(void *_r)
             atomic_inc(&r->semaphore);
 
             if ( i == 0 )
-                write_tsc(r->master_tsc_stamp);
+                write_tsc(master_tsc);
 
             while ( atomic_read(&r->semaphore) != (2*total_cpus - 1) )
                 cpu_relax();
@@ -1730,7 +1730,7 @@ static void time_calibration_tsc_rendezvous(void *_r)
             }
 
             if ( i == 0 )
-                write_tsc(r->master_tsc_stamp);
+                write_tsc(master_tsc);
 
             atomic_inc(&r->semaphore);
             while ( atomic_read(&r->semaphore) > total_cpus )
@@ -1739,9 +1739,17 @@ static void time_calibration_tsc_rendezvous(void *_r)
 
         /* Just in case a read above ended up reading zero. */
         tsc += !tsc;
+
+        /*
+         * To reduce latency of the TSC write on the last iteration,
+         * fetch the value to be written into a local variable. To avoid
+         * introducing yet another conditional branch (which the CPU may
+         * have difficulty predicting well) do this on all iterations.
+         */
+        master_tsc = r->master_tsc_stamp;
     }
 
-    time_calibration_rendezvous_tail(r, tsc, r->master_tsc_stamp);
+    time_calibration_rendezvous_tail(r, tsc, master_tsc);
 }
 
 /* Ordinary rendezvous function which does not modify TSC values. */
--
generated by git-patchbot for /home/xen/git/xen.git#master



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.