Xen project Mailing List

RE: [Xen-devel] Re: [PATCH] CPUIDLE: revise tsc-save/restore to avoid big tsc skew between cpus

To: 'Keir Fraser' <keir.fraser@xxxxxxxxxxxxx>, "Wei, Gang" <gang.wei@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>

Date: Fri, 5 Dec 2008 17:59:30 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc:

Delivery-date: Fri, 05 Dec 2008 02:00:09 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AclWoeM2xc+X6mj6QOaQHsyxDmpzagAFRoNpAAEP6tA=

Thread-topic: [Xen-devel] Re: [PATCH] CPUIDLE: revise tsc-save/restore to avoid big tsc skew between cpus

>From: Keir Fraser >Sent: Friday, December 05, 2008 4:54 PM >Synchronising to a start-of-day timestamp and extrapolating >with cpu_khz is >not a good idea because of the inherent accumulating >inaccuracy in cpu_khz >(it only has a fixed number of bits of precision). So, for example, if >certain CPUs have not deep-C-slept for a long while, they will >wander from >your 'true' TSC values and then you will see TSC mismatches >when the CPU >does eventually C-sleep, or compared with other CPUs when they do so. > >More significantly, cpu_khz is no good as a fixed static >estimate of CPU >clock speed when we start playing with P states. This is not big issue since most processors today has constant TSC immune from P-state change. Actually when freq change does affect TSC count, I don't think time calibration may actually help if that scale happens at small interval and it's still high possibility for multiple vcpus of one domain to observe time weirdness, or one vcpu migrating... > >I think your new code structure is correct. That is, work out >wakeup TSC >value from read_platform_stime() rather than some saved TSC >value. However, >you should extrapolate from that CPU's own t->stime_local_stamp, >t->tsc_scale, and t->local_tsc_stamp. It's probably a pretty >simple change >to your new cstate_restore_tsc(). Based on t->local_tsc_stamp to recover TSC can still accumulate errors, though in a far slower pace. Here the problem is software calculation is always inaccurate compared to what TSC would count if not stopped. We tried to calculate calculation error for each TSC save/restore. The approach is to use C2 (TSC not stopped) and then apply restore logic to compare to-be-written value to the real TSC count by rdtsc. We can see that 2k-3k drift can be observed when chipset link ASPM is enabled, and hundreds of drift when ASPM is disabled. As idle percentage on every processor is different, finally the effect is not only slower than real one (if not stopped), but also increasing drift among cpus. The first one normally just affects TOD faster or slower, but the latter one can generate severe guest problems like softlockup warning or DMA timeout when vcpu is migrated between cpus with big TSC/now skew. Recover TSC based on local calibration stamp reduces error accumulation pace from per-cstate-entry/exit to per-calibration (ms->s), but after a long run multiple cpus can still observe big skews. To avoid accumulating errors, the best way is to align to an absolute platform timer without counting last stamp. But one drawback as you said is to have intermittent skew when one cpu doesn't enter idle for a long time. But this is also true for above offset counting approach. Then if we agree always aligning TSC to absolute platform timer counter, it doesn't make difference to use cpu_khz or local tsc_scale since both are using scale factor calculated within a small period to represent the underlying crystal frequency. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.