[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v2 6/5] tools/docs: Remove PVRDTSCP remnants
PVRDTSCP is believed-unused, and its implementation has adverse consequences on unrelated functionality in the hypervisor. As a result, support has been removed. Modify libxl to provide a slightly more helpful error message if it encounters PVRDTSCP being selected. While adjusting TSC handling, make libxl check for errors from the set_tsc hypercall. Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> --- CC: Jan Beulich <JBeulich@xxxxxxxx> CC: Wei Liu <wei.liu2@xxxxxxxxxx> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> --- docs/man/xen-tscmode.pod.7 | 94 +----------- docs/man/xl.cfg.pod.5.in | 9 +- docs/misc/pvrdtscp.c | 307 -------------------------------------- tools/libxl/libxl_x86.c | 13 +- tools/python/xen/lowlevel/xc/xc.c | 2 +- 5 files changed, 19 insertions(+), 406 deletions(-) delete mode 100644 docs/misc/pvrdtscp.c diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7 index 819c61d..1d81a3f 100644 --- a/docs/man/xen-tscmode.pod.7 +++ b/docs/man/xen-tscmode.pod.7 @@ -77,9 +77,7 @@ highest performance is required. =item * B<tsc_mode=3> (PVRDTSCP). -High-TSC-frequency apps may be paravirtualized (modified) to -obtain both correctness and highest performance; any unmodified -apps must be TSC-resilient. +This mode has been removed. =back @@ -215,30 +213,6 @@ is emulated. Note that, though emulated, the "apparent" TSC frequency will be the TSC frequency of the initial physical machine, even after migration. -For environments where both TSC-safeness AND highest performance -even across migration is a requirement, application code can be specially -modified to use an algorithm explicitly designed into Xen for this purpose. -This mode (tsc_mode==3) is called PVRDTSCP, because it requires -app paravirtualization (awareness by the app that it may be running -on top of Xen), and utilizes a variation of the rdtsc instruction -called rdtscp that is available on most recent generation processors. -(The rdtscp instruction differs from the rdtsc instruction in that it -reads not only the TSC but an additional register set by system software.) -When a pvrdtscp-modified app is running on a processor that is both TSC-safe -and supports the rdtscp instruction, information can be obtained -about migration and TSC frequency/offset adjustment to allow the -vast majority of timestamps to be obtained at top performance; when -running on a TSC-unsafe processor or a processor that doesn't support -the rdtscp instruction, rdtscp is emulated. - -PVRDTSCP (tsc_mode==3) has two limitations. First, it applies to -all apps running in this virtual machine. This means that all -apps must either be TSC-resilient or pvrdtscp-modified. Second, -highest performance is only obtained on TSC-safe machines that -support the rdtscp instruction; when running on older machines, -rdtscp is emulated and thus slower. For more information on PVRDTSCP, -see below. - Finally, tsc_mode==1 always enables TSC emulation, regardless of the underlying physical hardware. The "apparent" TSC frequency will be the TSC frequency of the initial physical machine, even after migration. @@ -287,56 +261,7 @@ have been replaced by a paravirtualized equivalent of the cpuid instruction ("pvcpuid") and also trap to Xen. But apps in a PV guest that use a cpuid instruction execute it directly, without a trap to Xen. As a result, an app may directly examine the physical TSC Invariant cpuid bit and make -decisions based on that bit. This is still an unsolved problem, though -a workaround exists as part of the PVRDTSCP tsc_mode for apps that -can be modified. - -=head1 MORE ON PVRDTSCP - -Paravirtualized OS's use the "pvclock" algorithm to manage the passing -of time. This sophisticated algorithm obtains information from a memory -page shared between Xen and the OS and selects information from this -page based on the current virtual CPU (vcpu) in order to properly adapt to -TSC-unsafe systems and changes that occur across migration. Neither -this shared page nor the vcpu information is available to a userland -app so the pvclock algorithm cannot be directly used by an app, at least -without performance degradation roughly equal to the cost of just -emulating an rdtsc. - -As a result, as of 4.0, Xen provides capabilities for a userland app -to obtain key time values similar to the information accessible -to the PV OS pvclock algorithm. The app uses the rdtscp instruction -which is defined in recent processors to obtain both the TSC and an -auxiliary value called TSC_AUX. Xen is responsible for setting TSC_AUX -to the same value on all vcpus running any domain with tsc_mode==3; -further, Xen tools are responsible for monotonically incrementing TSC_AUX -anytime the domain is restored/migrated (thus changing key time values); -and, when the domain is running on a physical machine that either -is not TSC-safe or does not support the rdtscp instruction, Xen -is responsible for emulating the rdtscp instruction and for setting -TSC_AUX to zero on all processors. - -Xen also provides pvclock information via a "pvcpuid" instruction. -While this results in a slow trap, the information changes -(and thus must be reobtained via pvcpuid) ONLY when TSC_AUX -has changed, which should be very rare relative to a high -frequency of rdtscp instructions. - -Finally, Xen provides additional time-related information via -other pvcpuid instructions. First, an app is capable of -determining if it is currently running on Xen, next whether -the tsc_mode setting of the domain in which it is running, -and finally whether the underlying hardware is TSC-safe and -supports the rdtscp instruction. - -As a result, a pvrdtscp-modified app has sufficient information -to compute the pvclock "elapsed nanoseconds" which can -be used as a timestamp. And this can be done nearly as -fast as a native rdtsc instruction, much faster than emulation, -and also much faster than nearly all OS-provided time mechanisms. -While pvrtscp is too complex for most apps, certain enterprise -TSC-sensitive high-TSC-frequency apps may find it useful to -obtain a significant performance gain. +decisions based on that bit. =head1 HARDWARE TSC SCALING @@ -344,21 +269,16 @@ Intel VMX TSC scaling and AMD SVM TSC ratio allow the guest TSC read by guest rdtsc/p increasing in a different frequency than the host TSC frequency. -If a HVM container in default TSC mode (tsc_mode=0) or PVRDTSCP mode -(tsc_mode=3) is created on a host that provides constant TSC, its -guest TSC frequency will be the same as the host. If it is later -migrated to another host that provides constant TSC and supports Intel -VMX TSC scaling/AMD SVM TSC ratio, its guest TSC frequency will be the -same before and after migration. +If a HVM container in default TSC mode (tsc_mode=0) is created on a host +that provides constant TSC, its guest TSC frequency will be the same as +the host. If it is later migrated to another host that provides constant +TSC and supports Intel VMX TSC scaling/AMD SVM TSC ratio, its guest TSC +frequency will be the same before and after migration. For above HVM container in default TSC mode (tsc_mode=0), if above hosts support rdtscp, both guest rdtsc and rdtscp instructions will be executed natively before and after migration. -For above HVM container in PVRDTSCP mode (tsc_mode=3), if the -destination host does not support rdtscp, the guest rdtscp instruction -will be emulated with the guest TSC frequency. - =head1 AUTHORS Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in index b1c0be1..3b92f39 100644 --- a/docs/man/xl.cfg.pod.5.in +++ b/docs/man/xl.cfg.pod.5.in @@ -2099,14 +2099,7 @@ by h/w, else executed natively. =item B<native_paravirt> -Same as B<native>, except Xen manages the TSC_AUX register so the guest can -determine when a restore/migration has occurred and assumes guest -obtains/uses a pvclock-like mechanism to adjust for monotonicity and -frequency changes. - -If a HVM container in B<native_paravirt> TSC mode can execute both guest -rdtsc and guest rdtscp natively, then the guest TSC frequency will be -determined in a similar way to that of B<default> TSC mode. +This mode has been removed. =back diff --git a/docs/misc/pvrdtscp.c b/docs/misc/pvrdtscp.c deleted file mode 100644 index 8d25843..0000000 --- a/docs/misc/pvrdtscp.c +++ /dev/null @@ -1,307 +0,0 @@ -/* pvrdtscp algorithm - * - * This sample code demonstrates the use of the paravirtualized rdtscp - * algorithm. Using this algorithm, an application may communicate with - * the Xen hypervisor (version 4.0+) to obtain timestamp information which - * is both monotonically increasing and has a fixed 1 GHz rate, even across - * migrations between machines with different TSC rates and offsets. - * Further,the algorithm provides performance near the performance of a - * native rdtsc/rdtscp instruction -- much faster than emulation PROVIDED - * the application is running on a machine on which the rdtscp instruction - * is supported and TSC is "safe". The application must also be running in a - * PV domain. (HVM domains may be supported at a later time.) On machines - * where TSC is unsafe or the rdtscp instruction is not supported, Xen - * (v4.0+) provides emulation which is slower but consistent with the pvrdtscp - * algorithm, thus providing support for the algorithm for live migration - * across all machines. - * - * More information can be found within the Xen (4.0+) source tree at - * docs/misc/tscmode.txt - * - * Copyright (c) 2009 Oracle Corporation and/or its affiliates. - * All rights reserved - * Written by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> - * - * This code is derived from code licensed under the GNU - * General Public License ("GPL") version 2 and is therefore itself - * also licensed under the GPL version 2. - * - * This code is known to compile and run on Oracle Enterprise Linux 5 Update 2 - * using gcc version 4.1.2, but its purpose is to describe the pvrdtscp - * algorithm and its ABI to Xen version 4.0+ - */ - -#include <stdio.h> -#include <stdlib.h> -#include <string.h> -#include <sys/wait.h> - -#ifdef __LP64__ -#define __X86_64__ -typedef unsigned short u16; -typedef unsigned int u32; -typedef unsigned long u64; -typedef int i32; -typedef long i64; -#define NSEC_PER_SEC 1000000000 -#else -#define __X86_32__ -typedef unsigned int u16; -typedef unsigned long u32; -typedef unsigned long long u64; -typedef long i32; -typedef long long i64; -#define NSEC_PER_SEC 1000000000L -#endif - -static inline void hvm_cpuid(u32 idx, u32 sub, - u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) -{ - *eax = idx, *ecx = sub; - asm("cpuid" : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx) - : "0" (*eax), "2" (*ecx)); -} - -static inline void pv_cpuid(u32 idx, u32 sub, - u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) -{ - *eax = idx, *ecx = sub; - asm volatile ( "ud2a ; .ascii \"xen\"; cpuid" : "=a" (*eax), - "=b" (*ebx), "=c" (*ecx), "=d" (*edx) : "0" (*eax), "2" (*ecx)); -} - -static inline u64 do_rdtscp(u32 *aux) -{ -static u64 last = 0; - u32 lo32, hi32; - u64 val; - - asm volatile(".byte 0x0f,0x01,0xf9":"=a"(lo32),"=d"(hi32),"=c" (*aux)); - val = lo32 | ((u64)hi32 << 32); - return val; -} - -static inline int get_xen_tsc_mode(void) -{ - u32 val, dummy1, dummy2, dummy3; - pv_cpuid(0x40000003,0,&dummy1,&val,&dummy2,&dummy3); - return val; -} - -static inline int get_xen_vtsc(void) -{ - u32 val, dummy1, dummy2, dummy3; - pv_cpuid(0x40000003,0,&val,&dummy1,&dummy2,&dummy3); - return val & 1; -} - -static inline int get_xen_vtsc_khz(void) -{ - u32 val, dummy1, dummy2, dummy3; - pv_cpuid(0x40000003,0,&dummy1,&dummy2,&val,&dummy3); - return val; -} - -static inline u32 get_xen_cpu_khz(void) -{ - u32 cpu_khz, dummy1, dummy2, dummy3; - pv_cpuid(0x40000003,2,&cpu_khz,&dummy1,&dummy2,&dummy3); - return cpu_khz; -} - -static inline u32 get_xen_incarnation(void) -{ - u32 incarn, dummy1, dummy2, dummy3; - pv_cpuid(0x40000003,0,&dummy1,&dummy2,&dummy3,&incarn); - return incarn; -} - -static inline void get_xen_time_values(u64 *offset, u32 *mul_frac, u32 *shift) -{ - u32 off_lo, off_hi, sys_lo, sys_hi, dummy; - - pv_cpuid(0x40000003,1,&off_lo,&off_hi,mul_frac,shift); - *offset = off_lo | ((u64)off_hi << 32); -} - -static inline u64 scale_delta(u64 delta, u32 tsc_mul_frac, i32 tsc_shift) -{ - u64 product; -#ifdef __X86_32__ - u32 tmp1, tmp2; -#endif - - if ( tsc_shift < 0 ) - delta >>= -tsc_shift; - else - delta <<= tsc_shift; - -#ifdef __X86_32__ - asm ( - "mul %5 ; " - "mov %4,%%eax ; " - "mov %%edx,%4 ; " - "mul %5 ; " - "xor %5,%5 ; " - "add %4,%%eax ; " - "adc %5,%%edx ; " - : "=A" (product), "=r" (tmp1), "=r" (tmp2) - : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (tsc_mul_frac) ); -#else - asm ( - "mul %%rdx ; shrd $32,%%rdx,%%rax" - : "=a" (product) : "0" (delta), "d" ((u64)tsc_mul_frac) ); -#endif - - return product; -} - -static inline u64 get_pvrdtscp_timestamp(int *discontinuity) -{ - static int firsttime = 1; - static u64 last_pvrdtscp_timestamp = 0; - static u32 last_tsc_aux; - static u64 xen_ns_offset; - static u32 xen_tsc_to_ns_mul_frac, xen_tsc_to_ns_shift; - u32 this_tsc_aux; - u64 timestamp, cur_tsc, cur_ns; - - if (firsttime) { - cur_tsc = do_rdtscp(&last_tsc_aux); - get_xen_time_values(&xen_ns_offset, &xen_tsc_to_ns_mul_frac, - &xen_tsc_to_ns_shift); - cur_ns = scale_delta(cur_tsc, xen_tsc_to_ns_mul_frac, - xen_tsc_to_ns_shift); - timestamp = cur_ns - xen_ns_offset; - last_pvrdtscp_timestamp = timestamp; - firsttime = 0; - } - cur_tsc = do_rdtscp(&this_tsc_aux); - *discontinuity = 0; - while (this_tsc_aux != last_tsc_aux) { - /* if tsc_aux changed, try again */ - last_tsc_aux = this_tsc_aux; - get_xen_time_values(&xen_ns_offset, &xen_tsc_to_ns_mul_frac, - &xen_tsc_to_ns_shift); - cur_tsc = do_rdtscp(&this_tsc_aux); - *discontinuity = 1; - } - - /* compute nsec from TSC and Xen time values */ - cur_ns = scale_delta(cur_tsc, xen_tsc_to_ns_mul_frac, - xen_tsc_to_ns_shift); - timestamp = cur_ns - xen_ns_offset; - - /* enforce monotonicity just in case */ - if ((i64)(timestamp - last_pvrdtscp_timestamp) > 0) - last_pvrdtscp_timestamp = timestamp; - else { - /* this should never happen but we'll check it anyway in - * case of some strange combination of scaling errors - * occurs across a very fast migration */ - printf("Time went backwards by %lluns\n", - (unsigned long long)(last_pvrdtscp_timestamp-timestamp)); - timestamp = ++last_pvrdtscp_timestamp; - } - return timestamp; -} - -#define HVM 1 -#define PVM 0 - -static int running_on_xen(int hvm, u16 *version_major, u16 *version_minor) -{ - u32 eax, ebx, ecx, edx, base; - union { char csig[16]; u32 u[4]; } sig; - - for (base=0x40000000; base < 0x40010000; base += 0x100) { - if (hvm==HVM) - hvm_cpuid(base,0,&eax,&ebx,&ecx,&edx); - else - pv_cpuid(base,0,&eax,&ebx,&ecx,&edx); - sig.u[0] = ebx; sig.u[1] = ecx; sig.u[2] = edx; - sig.csig[12] = '\0'; - if (!strcmp("XenVMMXenVMM",&sig.csig[0]) && (eax >= (base+2))) { - if (hvm==HVM) - hvm_cpuid(base+1,0,&eax,&ebx,&ecx,&edx); - else - pv_cpuid(base+1,0,&eax,&ebx,&ecx,&edx); - *version_major = (eax >> 16) & 0xffff; - *version_minor = eax & 0xffff; - return 1; - } - } - return 0; -} - -main(int ac, char **av) -{ - u32 dummy; - u16 version_hi, version_lo; - u64 ts, last_ts; - int status, discontinuity = 0; - pid_t pid; - - if (running_on_xen(HVM,&version_hi,&version_lo)) { - printf("running on Xen v%d.%d as an HVM domain, " - "pvrdtsc not supported, exiting\n", - (int)version_hi, (int)version_lo); - exit(0); - } - pid = fork(); - if (pid == -1) { - fprintf(stderr,"Huh? Fork failed\n"); - return 0; - } - else if (pid == 0) { /* child */ - pv_cpuid(0x40000000,0,&dummy,&dummy,&dummy,&dummy); - exit(0); - } - waitpid(pid,&status,0); - if (!WIFEXITED(status)) - exit(0); - if (!running_on_xen(PVM,&version_hi,&version_lo)) { - printf("not running on Xen, exiting\n"); - exit(0); - } - printf("running on Xen v%d.%d as a PV domain\n", - (int)version_hi, (int)version_lo); - if ( version_hi <= 3 ) { - printf("pvrdtscp requires Xen version 4.0 or greater\n"); - /* exit(0); FIXME after xen-unstable is officially v4.0 */ - } - if ( get_xen_tsc_mode() != 3 ) - printf("tsc_mode not pvrdtscp, set tsc_mode=3, exiting\n"); - - /* OK, we are on Xen, now loop forever checking timestamps */ - ts = get_pvrdtscp_timestamp(&discontinuity); - printf("Starting with ts=%lluns 0x%llx (%llusec)\n",ts,ts,ts/NSEC_PER_SEC); - printf("incarn=%d: vtsc=%d, vtsc_khz=%lu, phys cpu_khz=%lu\n", - (unsigned long)get_xen_incarnation(), - (unsigned long)get_xen_vtsc(), - (unsigned long)get_xen_vtsc_khz(), - (unsigned long)get_xen_cpu_khz()); - ts = get_pvrdtscp_timestamp(&discontinuity); - last_ts = ts; - while (1) { - ts = get_pvrdtscp_timestamp(&discontinuity); - if (discontinuity) - printf("migrated/restored, incarn=%d: " - "vtsc now %d, vtsc_khz=%lu, phys cpu_khz=%lu\n", - (unsigned long)get_xen_incarnation(), - (unsigned long)get_xen_vtsc(), - (unsigned long)get_xen_vtsc_khz(), - (unsigned long)get_xen_cpu_khz()); - if (ts < last_ts) - /* this should NEVER happen, especially since there - * is a check for it in get_pvrdtscp_timestamp() */ - printf("Time went backwards: %lluns (%llusec)\n", - last_ts-ts,(last_ts-ts)/NSEC_PER_SEC); - if (ts > last_ts + 200000000LL) - /* this is OK, usually about 2sec for save/restore - * and a fraction of a second for live migrate */ - printf("Time jumped forward %lluns (%llusec)\n", - ts-last_ts,(ts-last_ts)/NSEC_PER_SEC); - last_ts = ts; - } -} diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index c04fd75..c0f88a7 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -309,12 +309,19 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config, tsc_mode = 2; break; case LIBXL_TSC_MODE_NATIVE_PARAVIRT: - tsc_mode = 3; - break; + LOGD(ERROR, domid, "TSC Mode native_paravirt (a.k.a PVRDTSCP) has been removed"); + ret = ERROR_FEATURE_REMOVED; + goto out; default: abort(); } - xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0); + + if (xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0)) { + LOGE(ERROR, "xc_domain_set_tsc_info() failed"); + ret = ERROR_FAIL; + goto out; + } + if (libxl_defbool_val(d_config->b_info.disable_migrate)) xc_domain_disable_migrate(ctx->xch, domid); rtc_timeoffset = d_config->b_info.rtc_timeoffset; diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 484b790..cc8175a 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -2439,7 +2439,7 @@ static PyMethodDef pyxc_methods[] = { "Set a domain's TSC mode\n" " dom [int]: Domain whose TSC mode is being set.\n" " tsc_mode [int]: 0=default (monotonic, but native where possible)\n" - " 1=always emulate 2=never emulate 3=pvrdtscp\n" + " 1=always emulate 2=never emulate\n" "Returns: [int] 0 on success; -1 on error.\n" }, { "domain_disable_migrate", -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |