[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support

On 28/12/2015 16:59, Joao Martins wrote:

I've been working on pvclock vdso support on Linux guests, and came
across Xen lacking support for PVCLOCK_TSC_STABLE_BIT flag which is
required for vdso as of the latest pvclock ABI shared with KVM.

, and originally borrowed from Xen.

Please be aware that all of this was originally the Xen ABI (c/s 1271b793, 2005) and was added to Linux (c/s 7af192c, 2008) for join use with KVM. In particular, Linux c/s 424c32f1a (which introduces 'flags') and every subsequent change in pvclock-abi.h violates the comment at the top, reminding people that the hypervisors must be kept in sync.

By the looks of things, the structures are still compatible, and having the two in sync is in everyones best interest. The first steps here need to be Linux upstreaming their local modifications, and further efforts made to ensuring that ABI changes don't go unnoticed as far as Xen is concerned (entry in the maintainers file with xen-devel listed?)

In addition, I've found some problems which aren't necessarily visible
to kernel as the pvclock algorithm in linux keeps the highest pvti
time read among all cpus. But as is, a process using vdso gettimeofday
observes a significant amount of warps (i.e. time going backwards) and
it could be due to 1) time calibration skew in xen rendezvous
algorithm, 2) xen clocksource not in sync with TSC and 3) in
situations when guests unaware of VCPUS moving to a different PCPU.
The warps are seen more frequently on PV guests (potentially because
vcpu time infos are only updated when guest is in kernel mode, and
perhaps lack of tsc_offset?), and in rare ocasions on HVM guests. And
it is worth noting that with guests VCPUs pinned, only PV guests see
these warps. But on HVM guests specifically: such warps only occur
when one of guest VCPUs is pinned to CPU0.

These are all concerning findings (especially the pinning on cpu0). Which version of Xen have you been developing on?

Haozhong Zhang (CC'd) found and fixed several timing related bugs as part of his VMX TSC Scaling support series (Message root at <1449435529-12989-1-git-send-email-haozhong.zhang@xxxxxxxxx>). I would be surprised if your identified bugs and his identified bugs didn't at least partially overlap. (Note that most of the series has yet to be applied).

As to the specific options you identify, the time calibration rendezvous is (or should be) a nop on modern boxes with constant/invarient/etc TSCs. I have a low priority TODO item to investigating conditionally disabling it, as it is a scalability concern on larger boxes. Option 2 seems more likely.

As for option 3, on a modern system it shouldn't make any difference. On an older system, it can presumably only be fixed by a guest performing its rdtsc between the two version checks, to ensure that it sees a consistent timestamp and scale, along with the hypervisor bumping version on deschedule, and against on schedule. However, this might be contrary to proposed plan to have one global pv wallclock.

PVCLOCK_TSC_STABLE_BIT is the flag telling the guest that the
vcpu_time_info (pvti) are monotonic as seen by any CPU,
and supporting it could help fixing the issue mentioned above.

Surely fixing some of these bugs are prerequisites to supporting TSC_STABLE_BIT ? Either way, we should see about doing both.

series aims to propose a solution to that and it's divided as

        * Patch 1: Adds the missing flag field to vcpu_time_info.
        * Patch 2: Adds a new clocksource based on TSC
        * Patch 3, 4: Cleanups for patch 5
        * Patch 5: Implements the PVCLOCK_TSC_STABLE_BIT.
        * Patch 6: Same as 5 before but for other platform timers

I have some doubts on the correctness of Patch 6 but was the only
solution I found for supporting PVCLOCK_TSC_STABLE_BIT when using
other clocksources (i.e. != tsc). The test was running time-warp-test,
that constantly calls clock_gettime/gettimeofday on every CPU. It
measures a delta with the previous returned value and mark a warp if
it's negative. I measured it during periods of 1h and 6h and check how
many warps and their values (alongside the amount of time skew).
Measurements are included in individual patches.

Note that most of the testing has been done with Linux 4.4 in which
these warps/skew could be easily observed as vdso would use each vCPU
pvti. Though Linux 4.5 will change this behaviour and use only the
vCPU0 pvti though still requiring PVCLOCK_TSC_STABLE_BIT usage.

I've been using it for a while in the past weeks with no issues so far
though still requires testing on bad TSC machines. But I would like to
get folks comments/suggestions if this is the correct approach in
implementing this flag.

If you have some test instructions, I can probably find some bad TSC machines amongst the selection of older hardware in the XenServer test pool.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.