Xen project Mailing List

Re: [PATCH v2 3/3] x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 8 Feb 2021 17:33:53 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sLypsmW5q1e6Ayo48trqI8SIWhmV4lmtJWvBXiMikZg=; b=CnEZ1RnoeALfM/M0UNx/SOQ83wVQxGZ5zb+cR9IjuXfgefnQXTNBg+sYuJ5F3MMQGCYFGphhZaGJUQsUSFxP+xrHRlAVRWC6UE4uATQpB6ggUjEfktPrXVHTb0eGyuMwcmcgUG4ODvesn2/xqIzJQTNAhpOirp/Dim0CRUgX38u8Ar0vziRxd90htf3usWdx2H8oaCkMb2Bd8wSebS6UNtjLkO8cMkr2Jh9XNmDQrZooJn4elLlwLD7Z/IgWMpOg103wXv+N2aHlIF6kA6oN8OObwnwge/M90XvU5Fyu5RfQJSTHrljTI46vzTczprkGuku7ZzlMB5jOM3DKlDbOdQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=amSSSO763ZmBPu4zYSBLKjao6pbErrWas+OXuQyYsWG3ezqBUn4zJINov4dtRlfUZ0VWlGBdFum/kkJ2aEc8xcuKNqTkVO7pP0urGDKe7Pf6rsQq/JUA0n5qhYtf7SSwtINuFCUaNRa0U3MGYO1tWgjUh/ebdKuSUW4Cy7Bi3s5xJcIbbgMzU3nenV7Sa757HFyP0cKQDSnAUa6Amx/Tjnmz7Lu3sPul8cX7DnTZhaxXJW5fy8RnIMIjz0njw+L5fDBnZ5uPba2ohoIDSiNLUP5Z/+ulqSrNgMhB4MWBSW9gFJVCOG3LA9w8J+bR8nyfLdhkQrzGZz6RzlHgViac1A==

Authentication-results: esa1.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Claudemir Todo Bom <claudemir@xxxxxxxxxxx>

Delivery-date: Mon, 08 Feb 2021 16:34:12 +0000

Ironport-sdr: te60TKIPCi4bvwd05rkVTextz6J+Gb36KbzIn78tXkB8j0nCEFZmJdcfgXqaCYIcm+3nOHv873 IVPcXt3H4AsZmzNgBo71KvX0/N9rhcQwd6rhbsWD1k73fpV+EBy76GpiWmVMsRUqbkDQ2iKmXY JgBfImg0KNSkBGWUxjNHb1dxb+zVUb51PwO+W9JQOclHNrQ2lLb5aKACPEG4WXl8EhKEW8hD6M 2GuBnofMEmUcbpy/G0ETK5mmK8jbkrVWtZ+6iOq8l+fCYtcOncHoNPKDdhr2J7Oup7+IG4lUHf KfA=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Feb 08, 2021 at 02:59:55PM +0100, Jan Beulich wrote: > On 08.02.2021 14:19, Roger Pau Monné wrote: > > On Mon, Feb 08, 2021 at 12:22:25PM +0100, Jan Beulich wrote: > >> On 08.02.2021 10:38, Roger Pau Monné wrote: > >>> On Mon, Feb 01, 2021 at 01:43:28PM +0100, Jan Beulich wrote: > >>>> --- > >>>> Since CPU0 reads its TSC last on the first iteration, if TSCs were > >>>> perfectly sync-ed there shouldn't ever be a need to update. However, > >>>> even on the TSC-reliable system I first tested this on (using > >>>> "tsc=skewed" to get this rendezvous function into use in the first > >>>> place) updates by up to several thousand clocks did happen. I wonder > >>>> whether this points at some problem with the approach that I'm not (yet) > >>>> seeing. > >>> > >>> I'm confused by this, so on a system that had reliable TSCs, which > >>> you forced to remove the reliable flag, and then you saw big > >>> differences when doing the rendezvous? > >>> > >>> That would seem to indicate that such system doesn't really have > >>> reliable TSCs? > >> > >> I don't think so, no. This can easily be a timing effect from the > >> heavy cache line bouncing involved here. > >> > >> What I'm worried here seeing these updates is that I might still > >> be moving TSCs backwards in ways observable to the rest of the > >> system (i.e. beyond the inherent property of the approach), and > >> this then getting corrected by a subsequent rendezvous. But as > >> said - I can't see what this could result from, and hence I'm > >> inclined to assume these are merely effects I've not found a > >> good explanation for so far. > > > > I'm slightly worried by this, maybe because I'm misunderstanding part > > of the TSC sync stuff. > > > > So you forced a system that Xen would otherwise consider to have a > > reliable TSC (one that doesn't need a calibration rendezvous) into > > doing the calibration rendezvous, and then the skew seen is quite big. > > I would expect such skew to be minimal? As we would otherwise consider > > the system to not need calibration at all. > > > > This makes me wonder if the system does indeed need such calibration > > (which I don't think so), or if the calibration that we actually try > > to do is quite bogus? > > I wouldn't call it bogus, but it's not very precise. Hence me > saying that if we wanted to make the problematic system seen as > TSC_RELIABLE (and hence be able to switch from "tsc" to "std" > rendezvous), we'd first need to improve accuracy here quite a bit. > (I suspect sufficient accuracy can only be achieved by making use > of TSC_ADJUST, but that's not available on the reporter's hardware, > so of no immediate interest here.) Right, TSC_ADJUST does indeed seem to be a better way to adjust TSC, and to notice if firmware has skewed them. > > >>>> @@ -1719,9 +1737,12 @@ static void time_calibration_tsc_rendezv > >>>> while ( atomic_read(&r->semaphore) > total_cpus ) > >>>> cpu_relax(); > >>>> } > >>>> + > >>>> + /* Just in case a read above ended up reading zero. */ > >>>> + tsc += !tsc; > >>> > >>> Won't that be worthy of an ASSERT_UNREACHABLE? I'm not sure I see how > >>> tsc could be 0 on a healthy system after the loop above. > >> > >> It's not forbidden for the firmware to set the TSCs to some > >> huge negative value. Considering the effect TSC_ADJUST has on > >> the actual value read by RDTSC, I think I did actually observe > >> a system coming up this way, because of (not very helpful) > >> TSC_ADJUST setting by firmware. So no, no ASSERT_UNREACHABLE() > >> here. > > > > But then the code here will loop 5 times, and it's not possible for > > those 5 loops to read a TSC value of 0? I could see it reading 0 on > > one of the iterations but not in all of them. > > Sure, we can read zero at most once here. Yet the "if ( tsc == 0 )" > conditionals get executed on every iteration, while they must yield > "true" only on the first (from the variable's initializer); we > absolutely need to go the "else if()" path on CPU0 on the 2nd > iteration, and we also need to skip that part on later iterations > on the other CPUs (for CPU0 to then take the 2nd "else if()" path > from no later than the 3rd iteration onwards; the body of this of > course will only be executed on the last iteration). Oh, I see. Then I think I have no further comments. If you agree to adjust the cmpxchg please add by R-b. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.