[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: [PATCH] CPUIDLE: revise tsc-save/restore to avoid big tsc skew between cpus

>From: Keir Fraser
>Sent: Friday, December 05, 2008 4:54 PM
>Synchronising to a start-of-day timestamp and extrapolating 
>with cpu_khz is
>not a good idea because of the inherent accumulating 
>inaccuracy in cpu_khz
>(it only has a fixed number of bits of precision). So, for example, if
>certain CPUs have not deep-C-slept for a long while, they will 
>wander from
>your 'true' TSC values and then you will see TSC mismatches 
>when the CPU
>does eventually C-sleep, or compared with other CPUs when they do so.
>More significantly, cpu_khz is no good as a fixed static 
>estimate of CPU
>clock speed when we start playing with P states.

This is not big issue since most processors today has constant
TSC immune from P-state change. Actually when freq change
does affect TSC count, I don't think time calibration may 
actually help if that scale happens at small interval and it's still 
high possibility for multiple vcpus of one domain to observe 
time weirdness, or one vcpu migrating...

>I think your new code structure is correct. That is, work out 
>wakeup TSC
>value from read_platform_stime() rather than some saved TSC 
>value. However,
>you should extrapolate from that CPU's own t->stime_local_stamp,
>t->tsc_scale, and t->local_tsc_stamp. It's probably a pretty 
>simple change
>to your new cstate_restore_tsc().

Based on t->local_tsc_stamp to recover TSC can still accumulate
errors, though in a far slower pace. Here the problem is software
calculation is always inaccurate compared to what TSC would count
if not stopped. We tried to calculate calculation error for each TSC
save/restore. The approach is to use C2 (TSC not stopped) and
then apply restore logic to compare to-be-written value to the real
TSC count by rdtsc. We can see that 2k-3k drift can be observed
when chipset link ASPM is enabled, and hundreds of drift when
ASPM is disabled. As idle percentage on every processor is different,
finally the effect is not only slower than real one (if not stopped), but 
also increasing drift among cpus. The first one normally just affects 
TOD faster or slower, but the latter one can generate severe guest 
problems like softlockup warning or DMA timeout when vcpu is 
migrated between cpus with big TSC/now skew. Recover TSC based
on local calibration stamp reduces error accumulation pace from
per-cstate-entry/exit to per-calibration (ms->s), but after a long
run multiple cpus can still observe big skews.

To avoid accumulating errors, the best way is to align to an 
absolute platform timer without counting last stamp. But one 
drawback as you said is to have intermittent skew when one 
cpu doesn't enter idle for a long time. But this is also true for 
above offset counting approach.

Then if we agree always aligning TSC to absolute platform timer
counter, it doesn't make difference to use cpu_khz or local tsc_scale
since both are using scale factor calculated within a small period
to represent the underlying crystal frequency.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.