[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid RDTSC emulation due to host clock drift



On 02.09.2019 20:27, Edwin Török  wrote:
> On a Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz the host frequency drifts:
> ```
> (XEN) [    6.607693] Detected 2600.004 MHz processor.
> (XEN) [ 2674.213081] dom1(hvm): 
> mode=0,ofs=0xfffee6f70b7faa48,khz=2600018,inc=3
> (XEN) [ 2674.213087] dom2(hvm): 
> mode=0,ofs=0xfffee6fd499835c0,khz=2600018,inc=2
> ```
> 
> The 2 domains were suspended prior to rebooting the host and applying a
> xen/microcode patch. After the reboot the frequency of the host was deemed to
> be slightly different, and therefore switching on RDTSC emulation for the 
> Linux
> HVM guest, even though the difference was only 5 ppm. This CPU doesn't support
> TSC scaling.
> 
> Therefore we should either measure the standard deviation of our calibration
> and have a range of acceptable frequencies as "same", or have a static
> tolerance value.
> 
> The platform timer's clock frequency accuracy is:
> * IA-PC HPET Specification 1.0a sections 2.2 and 2.4.1: 500 ppm or better
> * ACPI PM timer, and PIT timer do not have defined accuracies
> * Intel 300 Series datasheet section 25.6: 24 MHz crystal 100 ppm or better
> * NTP FAQ section 3.3 Clock Quality: 11 ppm drift due to temperature
> * section 2.2.2 claims that PIT/ACPI PM timer share the same crystal as HPET 
> and
> thus 500 ppm as an upper bound, "the real drift is usually smaller than 30ppm"
> 
> For simplicity and determinism opted for a static tolerance value of 100 ppm
> here, such that the any error would be well within the error you would get 
> with
> HPET/Linux's calibration. NTP can cope with a drift < 500 ppm.
> Most importantly this should stop Xen from claiming that the clock frequency 
> on
> the same host is different across reboots. Specifications do not currently
> mandate an accuracy higher than 100 ppm, therefore OSes should already be able
> to cope with such drift on real hardware. Any improvements in accuracy from
> future specifications/motherboards wouldn't be applicable, because they would
> also come with newer CPUs that support TSC scaling.
> 
> If the CPU does support TSC scaling Xen will of course still attempt to match
> the exact frequency value it thinks the guest had when it was suspended.
> See below for `if ( hvm_tsc_scaling_supported && !d->arch.vtsc )` (not visible
> in patch context).
> 
> llabs() doesn't appear to be available when building xen, hence the 2 
> comparisons.
> 
> After this patch when suspending a VM, and rebooting the host I get this 
> output:
> ```
> (XEN) [    6.614703] Detected 2600.010 MHz processor.
> (XEN) [  138.924342] TSC marked as reliable, warp = 0 (count=2)
> (XEN) [  138.924346] dom1(hvm): 
> mode=0,ofs=0xfffed01901016d18,khz=2600012,inc=2
> ```
> 
> Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx>

First of all - are you aware that there had been multiple iterations
of a patch (by Olaf Hering) making this a command line and/or guest
config controllable setting? If so, it would have been nice if at
least the cover letter identified the correlation. If not, please
take a look. After all it hasn't gone in so far because of objections
by Andrew.

Using a hardcoded tolerance value in any event raises the question
of how you know whether a particular guest OS can actually cope.

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2171,6 +2171,12 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
>          *elapsed_nsec = 0;
>  }
>  
> +static inline int frequency_same_with_tolerance(int64_t khz1, int64_t khz2)

The return type wants to be bool. And there wants to be an explaining
comment ahead of the function, (re-)stating some of what you have in
the description.

> +{
> +    int64_t ppm = (khz2 - khz1) * 1000000 / khz2;
> +    return -100 < ppm && ppm < 100;

While we have no llabs(), you should imo use either ABS() or
__builtin_labs() / __builtin_llabs().

Furthermore, could we limit this behavior to the case of there not
being TSC scaling available (due to running on old hardware, or due
to it being a PV guest)?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.