[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable



On Thu, 10 Jul 2025, Jan Beulich wrote:
> On 08.07.2025 20:32, Stefano Stabellini wrote:
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -2297,11 +2297,7 @@ static void cf_check time_calibration(void *unused)
> >      };
> >  
> >      if ( clocksource_is_tsc() )
> > -    {
> > -        local_irq_disable();
> > -        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
> > -        local_irq_enable();
> > -    }
> > +        return;
> 
> Assuming the rendezvous can indeed be entirely skipped, I agree that there's
> no point calling read_platform_stime() here. Yet to yield a consistent
> result, more changes are then necessary imo:
> - as indicated before, the invocation of this function from
>   verify_tsc_reliability() when plt_tsc was chosen is then entirely
>   pointless,
> - time_calibration_nop_rendezvous() would then apparently want purging, not
>   the least to make clear that TIME_CALIBRATE_SOFTIRQ is never raised in
>   this mode (one of your goals after all, aiui),

Good suggestions.


> - the function being a timer handler, it would be preferable if the timer
>   wasn't ever activated in this mode (at which point rather than returning
>   early, the code above could simply be purged, maybe replaced by e.g. an
>   assertion),

I see your point about the timer not being activated in the first place.

But if we want to make the code more reliable we should keep the if
(clocksource_is_tsc()) return; in time_calibration. That way, in case of
mistakes elsewhere, still the desired behavior is obtained.

I'll add the changes to cpu_frequency_change and local_time_calibration.
I'll append an incremental patch to clarify my intent.


> - the above in particular requires dealing with cpu_frequency_change() (the
>   other of the two places where the timer is actually activated).
>
> Some care may be needed in all of this taking into consideration that the
> platform timer change to TSC happens late. Albeit commit f954a1bf5f74
> ("x86/time: change initiation of the calibration timer") has imo eliminated
> the main concern here.
> 
> As to skipping the rendezvous: Besides invoking the calibration softirq,
> time_calibration_nop_rendezvous() also updates the per-CPU cpu_calibration
> fields. There would thus need to be a pretty formal proof that calculations
> involving ->local_stime or ->local_tsc can't possibly degrade or even
> degenerate when they remain at their boot-time values. (As to
> ->master_stime, afaict the field simply isn't used at all in that mode,
> which is a fair part of the reason why the code change above is okay _if_
> the rendezvous itself can be eliminated. The justification for that could
> also do with extending some, considering that much of the involved code is
> pretty subtle.) Alternatively, if such a proof turned out impossible,
> another way of updating the fields every once in a while would need adding.

Do you mean a formal proof that the TSC is actually stable from a
hardware perspective? The software algorithm is the same no matter the
number of updates.


> Finally, what you do here isn't entirely reliable as to your apparent end
> goal: "clocksource=tsc" is respected only when tsc_check_reliability()
> completes with an acceptable outcome. There's certainly some variability in
> this across multiple runs, i.e. if things went extremely bad, once in blue
> moon you may end up with the TSC being rejected for use as platform timer.
 
That is interesting! One option is to change the code so that
clocksource=tsc is always respected. I have appended the change on top
of this patch. Please let me know if you have other suggestions.


diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index d72e640f72..d29266086d 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1877,7 +1877,7 @@ int cpu_frequency_change(u64 freq)
     update_vcpu_system_time(current);
 
     /* A full epoch should pass before we check for deviation. */
-    if ( smp_processor_id() == 0 )
+    if ( smp_processor_id() == 0 && !clocksource_is_tsc() )
     {
         set_timer(&calibration_timer, NOW() + EPOCH);
         platform_time_calibration();
@@ -2024,7 +2024,7 @@ static void cf_check local_time_calibration(void)
     update_vcpu_system_time(current);
 
  out:
-    if ( smp_processor_id() == 0 )
+    if ( smp_processor_id() == 0 && !clocksource_is_tsc() )
     {
         set_timer(&calibration_timer, NOW() + EPOCH);
         platform_time_calibration();
@@ -2271,22 +2271,6 @@ static void cf_check 
time_calibration_std_rendezvous(void *_r)
     time_calibration_rendezvous_tail(r, 0, rdtsc_ordered());
 }
 
-/*
- * Rendezvous function used when clocksource is TSC and
- * no CPU hotplug will be performed.
- */
-static void cf_check time_calibration_nop_rendezvous(void *rv)
-{
-    const struct calibration_rendezvous *r = rv;
-    struct cpu_time_stamp *c = &this_cpu(cpu_calibration);
-
-    c->local_tsc    = r->master_tsc_stamp;
-    c->local_stime  = r->master_stime;
-    c->master_stime = r->master_stime;
-
-    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
-}
-
 static void (*time_calibration_rendezvous_fn)(void *) =
     time_calibration_std_rendezvous;
 
@@ -2488,7 +2472,7 @@ static int __init cf_check verify_tsc_reliability(void)
          * CPUs are booted.
          */
         tsc_check_reliability();
-        if ( tsc_max_warp )
+        if ( tsc_max_warp && strcmp(opt_clocksource, "tsc") )
         {
             printk("TSC warp detected, disabling TSC_RELIABLE\n");
             setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
@@ -2506,21 +2490,12 @@ static int __init cf_check verify_tsc_reliability(void)
              */
             on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
 
-            /*
-             * We won't do CPU Hotplug and TSC clocksource is being used which
-             * means we have a reliable TSC, plus we don't sync with any other
-             * clocksource so no need for rendezvous.
-             */
-            time_calibration_rendezvous_fn = time_calibration_nop_rendezvous;
-
             /* Finish platform timer switch. */
             try_platform_timer_tail();
 
             printk("Switched to Platform timer %s TSC\n",
                    freq_string(plt_src.frequency));
 
-            time_calibration(NULL);
-
             return 0;
         }
     }



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.