[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/5] x86/time: implement tsc as clocksource



>>> On 19.09.16 at 19:54, <joao.m.martins@xxxxxxxxxx> wrote:
> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>> On 19.09.16 at 18:11, <joao.m.martins@xxxxxxxxxx> wrote:
>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@xxxxxxxxxx> wrote:
>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>> stamps, which means platform timer is only used to seed the initial
>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>> having inconsistent readings in this short period (i.e. until
>>>>> calibration fires).
>>>>
>>>> And within this one second, which may cover some of Dom0's
>>>> booting up, it is okay to have inconsistencies?
>>> It's not okay which is why I am removing this possibility when switching to 
>>> TSC.
>>> The inconsistencies in those readings (if I wasn't adjusting) would be 
>>> because
>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>> previous calibration or platform time initialization (while still was HPET,
>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and 
>>> instead
>>> change it to "remove the possibility" in this last sentence?
>> 
>> Let's not do the 2nd step before the 1st, which is the question of
>> what happens prior to and what actually changes at this first
>> calibration (after 1 sec).
> The first calibration won't change much - this 1-sec was meant when having
> nop_rendezvous which is the first time platform timer would be used to set 
> local
> cpu_time (will adjust the mention above as it's misleading for the reader as 
> it
> doesn't refer to this patch).

So what makes it that it actually _is_ nop_rendezvous after that one
second? (And yes, part of this may indeed be just bad placement of
the description, as iirc nop_rendezvous gets introduced only in a later
patch.)

> Though reseeding the cpu times to boot_tsc_stamp
> prior to calibration (*without* the time latch values from previous platform
> timer) we can ensure NOW/get_s_time or calls to update_vcpu_system_time() will
> see monotonically increasing values. Otherwise keeping the previous ones,
> calibration would just add up local TSC delta and any existing divergence
> wouldn't be solved. On a later patch when we set the stable bit (with nop
> rendezvous), if these cpu times weren't adjusted guests/xen would still see
> small divergence between CPUs until the first calibration was fired (after we
> switched to TSC).

Right. But my concern is regarding the time window _between_
switching to TSC and running calibration the first time afterwards.
Part of this indeed gets taken care of by the re-seeding (which,
other than one could imply from what you say above, doesn't get
done [immediately] prior to calibration, but upon switching to TSC).
But is re-seeding without switching to nop_rendezvous really
sufficient (or in other words, can the patches really be broken up
like this)?

>>>>> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>>>>>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>>>>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>>>>          }
>>>>> +        else if ( !strcmp(opt_clocksource, "tsc") &&
>>>>> +                  (try_platform_timer(&plt_tsc) > 0) )
>>>>> +        {
>>>>> +            /*
>>>>> +             * Platform timer has changed and CPU time will only be 
>>>>> updated
>>>>> +             * after we set again the calibration timer, which means we 
>>>>> need to
>>>>> +             * seed again each local CPU time. At this stage TSC is 
>>>>> known to be
>>>>> +             * reliable i.e. monotonically increasing across all CPUs so 
>>>>> this
>>>>> +             * lets us remove the skew between platform timer and TSC, 
>>>>> since
>>>>> +             * these are now effectively the same.
>>>>> +             */
>>>>> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 
>>>>> 1);
>>>>> +
>>>>> +            /* Finish platform timer switch. */
>>>>> +            try_platform_timer_tail();
>>>>> +
>>>>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>>>>> +                   freq_string(plt_src.frequency));
>>>>
>>>> This message should have the same log level as the one at the end
>>>> of init_platform_timer().
>>> Agreed, but at the end of init_platform_timer there is a plain printk with 
>>> an
>>> omitted log level. Or do you mean to remove XENLOG_INFO from this printk 
>>> above
>>> or, instead add XENLOG_INFO to one printk at the end of 
>>> init_platform_timer() ?
>> 
>> Well, info would again be too low a level for my taste. Hence either
>> remove the level here (slightly preferred from my pov), or make both
>> warning.
> As your preference goes towards without the log level, I will re-introduce 
> back
> without it. Although I would find clearer to use printk with a log level as it
> was advised in earlier reviews.
> 
> NB: My suggestion of info as level is because my usual line of thought is to 
> see
> warning as something potentially erroneous that user should be warned about, 
> and
> error as being an actual error.

I understand and mostly agree. There are a few cases though (and
we're dealing with one here imo) where the absence of a log level is
better: We want these messages present in the log by default (which
they wouldn't be if they were info), but they're also not really
warnings. Perhaps simply for documentation purposes we could add
XENLOG_DEFAULT (expanding to an empty string), but that's not
something to be done in this patch.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.