[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v13] tolerate jitter in cpu_khz calculation to avoid TSC emulation



>>> On 13.03.19 at 09:28, <olaf@xxxxxxxxx> wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -43,6 +43,23 @@ static char __initdata opt_clocksource[10];
>  string_param("clocksource", opt_clocksource);
>  
>  unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
> +/*
> + * NTP implementations running within the domU can handle a certain
> + * difference of the system clockspeed, compared to an external
> + * clocksource. This is ususally described as "drift". How much drift an
> + * OS can handle is described in its documentation. For NTP in Linux the
> + * value is 500 PPM, which is the lowest compared to other OS.
> + */
> +#define VTSC_NTP_PPM_TOLERANCE 500UL
> +/*
> + * The measurement of cpu_khz is not accurate. Its accuracy depends on the
> + * hardware. A bunch of systems with supposedly identical frequencies will
> + * measure different frequencies, which will also vary accross reboots.
> + * This variable tries to cover a range of frequencies seen in the wild.
> + * The range is substracted from the PPM value above.

subtracted

I also don't think the inaccuracy is only because of measurement
being imprecise. I don't think any two crystals will ever provide the
exact same frequencies.

> @@ -1885,6 +1902,27 @@ void __init early_time_init(void)
>      printk("Detected %lu.%03lu MHz processor.\n", 
>             cpu_khz / 1000, cpu_khz % 1000);
>  
> +    /*
> +     * How many kHz (in other words: drift) is ntpd in domU expected to 
> handle?
> +     *    freq    tolerated freq difference
> +     *  ------- = -------------------------
> +     *  Million         Million + PPM      
> +     */
> +    tmp = 1000 * 1000;
> +    tmp += VTSC_NTP_PPM_TOLERANCE;
> +    tmp *= cpu_khz;
> +    tmp /= 1000 * 1000;
> +
> +    tmp -= cpu_khz;
> +
> +    /*
> +     * Reduce the theoretical upper limit by the assumed measuring 
> inaccuracy.
> +     */
> +    if ( tmp >= VTSC_MEASUREMENT_INACCURACY_RANGE_KHZ )
> +        tmp -= VTSC_MEASUREMENT_INACCURACY_RANGE_KHZ;

The discontinuity is still there, and so far you've failed to explain
why a discontinuity is what you want here.

> @@ -2193,6 +2231,8 @@ int tsc_set_info(struct domain *d,
>  
>      switch ( tsc_mode )
>      {
> +        bool disable_vtsc;
> +
>      case TSC_MODE_DEFAULT:
>      case TSC_MODE_ALWAYS_EMULATE:
>          d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
> @@ -2201,13 +2241,30 @@ int tsc_set_info(struct domain *d,
>  
>          /*
>           * In default mode use native TSC if the host has safe TSC and
> -         * host and guest frequencies are the same (either "naturally" or
> -         * - for HVM/PVH - via TSC scaling).
> +         * host and guest frequencies are (almost) the same (either 
> "naturally"
> +         * or - for HVM/PVH - via TSC scaling).
>           * When a guest is created, gtsc_khz is passed in as zero, making
>           * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
>           */
> +        disable_vtsc = d->arch.tsc_khz == cpu_khz;
> +
> +        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz && vtsc_tolerance_khz )
> +        {
> +            long khz_diff;
> +
> +            khz_diff = ABS(((long)cpu_khz - gtsc_khz));

This could easily be the initializer of the variable. And there's one
too many pair of parentheses.

I also don't see why the variable needs to be of a signed type.
Then again maybe the ABS() would better move ...

> +            disable_vtsc = khz_diff <= vtsc_tolerance_khz;

... here anyway, such that ...

> +            printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
> +                   " domU expects %u kHz,"
> +                   " difference of %ld is %s tolerance of %u\n",
> +                   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
> +                   disable_vtsc ? "within" : "outside",
> +                   vtsc_tolerance_khz);

... this logs the properly signed value?

> +        }
> +
>          if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
> -             (d->arch.tsc_khz == cpu_khz ||
> +             (disable_vtsc ||
>                (is_hvm_domain(d) &&
>                 hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
>          {

I'm sorry, but I continue to object to this adjustment getting done
both by default _and_ not in a per-guest manner. As said before,
you can't demand guests to run NTP, and hence you can't expect
them to get along with a few hundred kHz jump in observed TSC
frequency. Whether the performance drop due to vTSC use is
better or worse is a policy decision, which we should leave to the
admin. Hence the feature needs to be off by default, and there
needs to be at least a host-wide control to enable it; a per-guest
control would be better. IOW I explicitly do not agree with the
last sentence of the commit message.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.