[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v2 6/5] tools/docs: Remove PVRDTSCP remnants



PVRDTSCP is believed-unused, and its implementation has adverse consequences
on unrelated functionality in the hypervisor.  As a result, support has been
removed.

Modify libxl to provide a slightly more helpful error message if it encounters
PVRDTSCP being selected.  While adjusting TSC handling, make libxl check for
errors from the set_tsc hypercall.

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Wei Liu <wei.liu2@xxxxxxxxxx>
CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
 docs/man/xen-tscmode.pod.7        |  94 +-----------
 docs/man/xl.cfg.pod.5.in          |   9 +-
 docs/misc/pvrdtscp.c              | 307 --------------------------------------
 tools/libxl/libxl_x86.c           |  13 +-
 tools/python/xen/lowlevel/xc/xc.c |   2 +-
 5 files changed, 19 insertions(+), 406 deletions(-)
 delete mode 100644 docs/misc/pvrdtscp.c

diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 819c61d..1d81a3f 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -77,9 +77,7 @@ highest performance is required.
 
 =item * B<tsc_mode=3> (PVRDTSCP).
 
-High-TSC-frequency apps may be paravirtualized (modified) to
-obtain both correctness and highest performance; any unmodified
-apps must be TSC-resilient.
+This mode has been removed.
 
 =back
 
@@ -215,30 +213,6 @@ is emulated.  Note that, though emulated, the "apparent" 
TSC frequency
 will be the TSC frequency of the initial physical machine, even after
 migration.
 
-For environments where both TSC-safeness AND highest performance
-even across migration is a requirement, application code can be specially
-modified to use an algorithm explicitly designed into Xen for this purpose.
-This mode (tsc_mode==3) is called PVRDTSCP, because it requires
-app paravirtualization (awareness by the app that it may be running
-on top of Xen), and utilizes a variation of the rdtsc instruction
-called rdtscp that is available on most recent generation processors.
-(The rdtscp instruction differs from the rdtsc instruction in that it
-reads not only the TSC but an additional register set by system software.)
-When a pvrdtscp-modified app is running on a processor that is both TSC-safe
-and supports the rdtscp instruction, information can be obtained
-about migration and TSC frequency/offset adjustment to allow the
-vast majority of timestamps to be obtained at top performance; when
-running on a TSC-unsafe processor or a processor that doesn't support
-the rdtscp instruction, rdtscp is emulated.
-
-PVRDTSCP (tsc_mode==3) has two limitations.  First, it applies to
-all apps running in this virtual machine.  This means that all
-apps must either be TSC-resilient or pvrdtscp-modified.  Second,
-highest performance is only obtained on TSC-safe machines that
-support the rdtscp instruction; when running on older machines,
-rdtscp is emulated and thus slower.  For more information on PVRDTSCP,
-see below.
-
 Finally, tsc_mode==1 always enables TSC emulation, regardless of
 the underlying physical hardware. The "apparent" TSC frequency will
 be the TSC frequency of the initial physical machine, even after migration.
@@ -287,56 +261,7 @@ have been replaced by a paravirtualized equivalent of the 
cpuid instruction
 ("pvcpuid") and also trap to Xen.  But apps in a PV guest that use a
 cpuid instruction execute it directly, without a trap to Xen.  As a result,
 an app may directly examine the physical TSC Invariant cpuid bit and make
-decisions based on that bit.  This is still an unsolved problem, though
-a workaround exists as part of the PVRDTSCP tsc_mode for apps that
-can be modified.
-
-=head1 MORE ON PVRDTSCP
-
-Paravirtualized OS's use the "pvclock" algorithm to manage the passing
-of time.  This sophisticated algorithm obtains information from a memory
-page shared between Xen and the OS and selects information from this
-page based on the current virtual CPU (vcpu) in order to properly adapt to
-TSC-unsafe systems and changes that occur across migration.  Neither
-this shared page nor the vcpu information is available to a userland
-app so the pvclock algorithm cannot be directly used by an app, at least
-without performance degradation roughly equal to the cost of just
-emulating an rdtsc.
-
-As a result, as of 4.0, Xen provides capabilities for a userland app
-to obtain key time values similar to the information accessible
-to the PV OS pvclock algorithm.  The app uses the rdtscp instruction
-which is defined in recent processors to obtain both the TSC and an
-auxiliary value called TSC_AUX.  Xen is responsible for setting TSC_AUX
-to the same value on all vcpus running any domain with tsc_mode==3;
-further, Xen tools are responsible for monotonically incrementing TSC_AUX
-anytime the domain is restored/migrated (thus changing key time values);
-and, when the domain is running on a physical machine that either
-is not TSC-safe or does not support the rdtscp instruction, Xen
-is responsible for emulating the rdtscp instruction and for setting
-TSC_AUX to zero on all processors.
-
-Xen also provides pvclock information via a "pvcpuid" instruction.
-While this results in a slow trap, the information changes
-(and thus must be reobtained via pvcpuid) ONLY when TSC_AUX
-has changed, which should be very rare relative to a high
-frequency of rdtscp instructions.
-
-Finally, Xen provides additional time-related information via
-other pvcpuid instructions.  First, an app is capable of
-determining if it is currently running on Xen, next whether
-the tsc_mode setting of the domain in which it is running,
-and finally whether the underlying hardware is TSC-safe and
-supports the rdtscp instruction.
-
-As a result, a pvrdtscp-modified app has sufficient information
-to compute the pvclock "elapsed nanoseconds" which can
-be used as a timestamp.  And this can be done nearly as
-fast as a native rdtsc instruction, much faster than emulation,
-and also much faster than nearly all OS-provided time mechanisms.
-While pvrtscp is too complex for most apps, certain enterprise
-TSC-sensitive high-TSC-frequency apps may find it useful to
-obtain a significant performance gain.
+decisions based on that bit.
 
 =head1 HARDWARE TSC SCALING
 
@@ -344,21 +269,16 @@ Intel VMX TSC scaling and AMD SVM TSC ratio allow the 
guest TSC read
 by guest rdtsc/p increasing in a different frequency than the host
 TSC frequency.
 
-If a HVM container in default TSC mode (tsc_mode=0) or PVRDTSCP mode
-(tsc_mode=3) is created on a host that provides constant TSC, its
-guest TSC frequency will be the same as the host. If it is later
-migrated to another host that provides constant TSC and supports Intel
-VMX TSC scaling/AMD SVM TSC ratio, its guest TSC frequency will be the
-same before and after migration.
+If a HVM container in default TSC mode (tsc_mode=0) is created on a host
+that provides constant TSC, its guest TSC frequency will be the same as
+the host. If it is later migrated to another host that provides constant
+TSC and supports Intel VMX TSC scaling/AMD SVM TSC ratio, its guest TSC
+frequency will be the same before and after migration.
 
 For above HVM container in default TSC mode (tsc_mode=0), if above
 hosts support rdtscp, both guest rdtsc and rdtscp instructions will be
 executed natively before and after migration.
 
-For above HVM container in PVRDTSCP mode (tsc_mode=3), if the
-destination host does not support rdtscp, the guest rdtscp instruction
-will be emulated with the guest TSC frequency.
-
 =head1 AUTHORS
 
 Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index b1c0be1..3b92f39 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -2099,14 +2099,7 @@ by h/w, else executed natively.
 
 =item B<native_paravirt>
 
-Same as B<native>, except Xen manages the TSC_AUX register so the guest can
-determine when a restore/migration has occurred and assumes guest
-obtains/uses a pvclock-like mechanism to adjust for monotonicity and
-frequency changes.
-
-If a HVM container in B<native_paravirt> TSC mode can execute both guest
-rdtsc and guest rdtscp natively, then the guest TSC frequency will be
-determined in a similar way to that of B<default> TSC mode.
+This mode has been removed.
 
 =back
 
diff --git a/docs/misc/pvrdtscp.c b/docs/misc/pvrdtscp.c
deleted file mode 100644
index 8d25843..0000000
--- a/docs/misc/pvrdtscp.c
+++ /dev/null
@@ -1,307 +0,0 @@
-/* pvrdtscp algorithm
- *
- * This sample code demonstrates the use of the paravirtualized rdtscp
- * algorithm.  Using this algorithm, an application may communicate with
- * the Xen hypervisor (version 4.0+) to obtain timestamp information which
- * is both monotonically increasing and has a fixed 1 GHz rate, even across
- * migrations between machines with different TSC rates and offsets.
- * Further,the algorithm provides performance near the performance of a
- * native rdtsc/rdtscp instruction -- much faster than emulation PROVIDED
- * the application is running on a machine on which the rdtscp instruction
- * is supported and TSC is "safe". The application must also be running in a
- * PV domain.  (HVM domains may be supported at a later time.) On machines
- * where TSC is unsafe or the rdtscp instruction is not supported, Xen
- * (v4.0+) provides emulation which is slower but consistent with the pvrdtscp
- * algorithm, thus providing support for the algorithm for live migration
- * across all machines.
- *
- * More information can be found within the Xen (4.0+) source tree at
- *  docs/misc/tscmode.txt
- *
- * Copyright (c) 2009 Oracle Corporation and/or its affiliates.
- * All rights reserved
- * Written by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
- * 
- * This code is derived from code licensed under the GNU
- * General Public License ("GPL") version 2 and is therefore itself
- * also licensed under the GPL version 2.
- *
- * This code is known to compile and run on Oracle Enterprise Linux 5 Update 2
- * using gcc version 4.1.2, but its purpose is to describe the pvrdtscp
- * algorithm and its ABI to Xen version 4.0+ 
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/wait.h>
-
-#ifdef __LP64__
-#define __X86_64__
-typedef unsigned short u16;
-typedef unsigned int u32;
-typedef unsigned long u64;
-typedef int i32;
-typedef long i64;
-#define NSEC_PER_SEC 1000000000
-#else
-#define __X86_32__
-typedef unsigned int u16;
-typedef unsigned long u32;
-typedef unsigned long long u64;
-typedef long i32;
-typedef long long i64;
-#define NSEC_PER_SEC 1000000000L
-#endif
-
-static inline void hvm_cpuid(u32 idx, u32 sub,
-                               u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
-{
-       *eax = idx, *ecx = sub;
-       asm("cpuid" : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
-           : "0" (*eax), "2" (*ecx));
-}
-
-static inline void pv_cpuid(u32 idx, u32 sub,
-                               u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
-{
-       *eax = idx, *ecx = sub;
-       asm volatile ( "ud2a ; .ascii \"xen\"; cpuid" : "=a" (*eax),
-            "=b" (*ebx), "=c" (*ecx), "=d" (*edx) : "0" (*eax), "2" (*ecx));
-}
-
-static inline u64 do_rdtscp(u32 *aux)
-{
-static u64 last = 0;
-       u32 lo32, hi32;
-       u64 val;
-
-       asm volatile(".byte 0x0f,0x01,0xf9":"=a"(lo32),"=d"(hi32),"=c" (*aux));
-       val = lo32 | ((u64)hi32 << 32);
-       return val;
-}
-
-static inline int get_xen_tsc_mode(void)
-{
-       u32 val, dummy1, dummy2, dummy3;
-       pv_cpuid(0x40000003,0,&dummy1,&val,&dummy2,&dummy3);
-       return val;
-}
-
-static inline int get_xen_vtsc(void)
-{
-       u32 val, dummy1, dummy2, dummy3;
-       pv_cpuid(0x40000003,0,&val,&dummy1,&dummy2,&dummy3);
-       return val & 1;
-}
-
-static inline int get_xen_vtsc_khz(void)
-{
-       u32 val, dummy1, dummy2, dummy3;
-       pv_cpuid(0x40000003,0,&dummy1,&dummy2,&val,&dummy3);
-       return val;
-}
-
-static inline u32 get_xen_cpu_khz(void)
-{
-       u32 cpu_khz, dummy1, dummy2, dummy3;
-       pv_cpuid(0x40000003,2,&cpu_khz,&dummy1,&dummy2,&dummy3);
-       return cpu_khz;
-}
-
-static inline u32 get_xen_incarnation(void)
-{
-       u32 incarn, dummy1, dummy2, dummy3;
-       pv_cpuid(0x40000003,0,&dummy1,&dummy2,&dummy3,&incarn);
-       return incarn;
-}
-
-static inline void get_xen_time_values(u64 *offset, u32 *mul_frac, u32 *shift)
-{
-       u32 off_lo, off_hi, sys_lo, sys_hi, dummy;
-
-       pv_cpuid(0x40000003,1,&off_lo,&off_hi,mul_frac,shift);
-       *offset = off_lo | ((u64)off_hi << 32);
-}
-
-static inline u64 scale_delta(u64 delta, u32 tsc_mul_frac, i32 tsc_shift)
-{
-    u64 product;
-#ifdef __X86_32__
-    u32 tmp1, tmp2;
-#endif
-
-    if ( tsc_shift < 0 )
-        delta >>= -tsc_shift;
-    else
-        delta <<= tsc_shift;
-
-#ifdef __X86_32__
-    asm (
-        "mul  %5       ; "
-        "mov  %4,%%eax ; "
-        "mov  %%edx,%4 ; "
-        "mul  %5       ; "
-        "xor  %5,%5    ; "
-        "add  %4,%%eax ; "
-        "adc  %5,%%edx ; "
-        : "=A" (product), "=r" (tmp1), "=r" (tmp2)
-        : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (tsc_mul_frac) );
-#else
-    asm (
-        "mul %%rdx ; shrd $32,%%rdx,%%rax"
-        : "=a" (product) : "0" (delta), "d" ((u64)tsc_mul_frac) );
-#endif
-
-    return product;
-}
-
-static inline u64 get_pvrdtscp_timestamp(int *discontinuity)
-{
-       static int firsttime = 1;
-       static u64 last_pvrdtscp_timestamp = 0;
-       static u32 last_tsc_aux;
-       static u64 xen_ns_offset;
-       static u32 xen_tsc_to_ns_mul_frac, xen_tsc_to_ns_shift;
-       u32 this_tsc_aux;
-       u64 timestamp, cur_tsc, cur_ns;
-
-       if (firsttime) {
-               cur_tsc = do_rdtscp(&last_tsc_aux);
-               get_xen_time_values(&xen_ns_offset, &xen_tsc_to_ns_mul_frac,
-                                       &xen_tsc_to_ns_shift);
-               cur_ns = scale_delta(cur_tsc, xen_tsc_to_ns_mul_frac,
-                                       xen_tsc_to_ns_shift);
-               timestamp = cur_ns - xen_ns_offset;
-               last_pvrdtscp_timestamp = timestamp;
-               firsttime = 0;
-       }
-       cur_tsc = do_rdtscp(&this_tsc_aux);
-       *discontinuity = 0;
-       while (this_tsc_aux != last_tsc_aux) {
-               /* if tsc_aux changed, try again */
-               last_tsc_aux = this_tsc_aux;
-               get_xen_time_values(&xen_ns_offset, &xen_tsc_to_ns_mul_frac,
-                                       &xen_tsc_to_ns_shift);
-               cur_tsc = do_rdtscp(&this_tsc_aux);
-               *discontinuity = 1;
-       }
-
-       /* compute nsec from TSC and Xen time values */
-       cur_ns = scale_delta(cur_tsc, xen_tsc_to_ns_mul_frac,
-                                       xen_tsc_to_ns_shift);
-       timestamp = cur_ns - xen_ns_offset;
-
-       /* enforce monotonicity just in case */
-       if ((i64)(timestamp - last_pvrdtscp_timestamp) > 0)
-               last_pvrdtscp_timestamp = timestamp;
-       else {
-               /* this should never happen but we'll check it anyway in
-                * case of some strange combination of scaling errors
-                * occurs across a very fast migration */
-               printf("Time went backwards by %lluns\n",
-                   (unsigned long long)(last_pvrdtscp_timestamp-timestamp));
-               timestamp = ++last_pvrdtscp_timestamp;
-       }
-       return timestamp;
-}
-
-#define HVM 1
-#define PVM 0
-
-static int running_on_xen(int hvm, u16 *version_major, u16 *version_minor)
-{
-       u32 eax, ebx, ecx, edx, base;
-       union { char csig[16]; u32 u[4]; } sig;
-
-       for (base=0x40000000; base < 0x40010000; base += 0x100) {
-               if (hvm==HVM)
-                       hvm_cpuid(base,0,&eax,&ebx,&ecx,&edx);
-               else
-                       pv_cpuid(base,0,&eax,&ebx,&ecx,&edx);
-               sig.u[0] = ebx; sig.u[1] = ecx; sig.u[2] = edx;
-               sig.csig[12] = '\0';
-               if (!strcmp("XenVMMXenVMM",&sig.csig[0]) && (eax >= (base+2))) {
-                               if (hvm==HVM)
-                                       hvm_cpuid(base+1,0,&eax,&ebx,&ecx,&edx);
-                               else
-                                       pv_cpuid(base+1,0,&eax,&ebx,&ecx,&edx);
-                               *version_major = (eax >> 16) & 0xffff;
-                               *version_minor = eax & 0xffff;
-                               return 1;
-               }
-       }
-       return 0;
-}
-
-main(int ac, char **av)
-{
-       u32 dummy;
-       u16 version_hi, version_lo;
-       u64 ts, last_ts;
-       int status, discontinuity = 0;
-       pid_t pid;
-
-       if (running_on_xen(HVM,&version_hi,&version_lo)) {
-               printf("running on Xen v%d.%d as an HVM domain, "
-                       "pvrdtsc not supported, exiting\n",
-                       (int)version_hi, (int)version_lo);
-               exit(0);
-       }
-       pid = fork();
-       if (pid == -1) {
-               fprintf(stderr,"Huh? Fork failed\n");
-               return 0;
-       }
-       else if (pid == 0) { /* child */
-               pv_cpuid(0x40000000,0,&dummy,&dummy,&dummy,&dummy);
-               exit(0);
-       }
-       waitpid(pid,&status,0);
-       if (!WIFEXITED(status))
-               exit(0);
-       if (!running_on_xen(PVM,&version_hi,&version_lo)) {
-               printf("not running on Xen, exiting\n");
-               exit(0);
-       }
-       printf("running on Xen v%d.%d as a PV domain\n",
-               (int)version_hi, (int)version_lo);
-       if ( version_hi <= 3 ) {
-               printf("pvrdtscp requires Xen version 4.0 or greater\n");
-               /* exit(0); FIXME after xen-unstable is officially v4.0 */
-       }
-       if ( get_xen_tsc_mode() != 3 )
-               printf("tsc_mode not pvrdtscp, set tsc_mode=3, exiting\n");
-
-       /* OK, we are on Xen, now loop forever checking timestamps */
-       ts = get_pvrdtscp_timestamp(&discontinuity);
-       printf("Starting with ts=%lluns 0x%llx 
(%llusec)\n",ts,ts,ts/NSEC_PER_SEC);
-       printf("incarn=%d: vtsc=%d, vtsc_khz=%lu, phys cpu_khz=%lu\n",
-                               (unsigned long)get_xen_incarnation(),
-                               (unsigned long)get_xen_vtsc(),
-                               (unsigned long)get_xen_vtsc_khz(),
-                               (unsigned long)get_xen_cpu_khz());
-       ts = get_pvrdtscp_timestamp(&discontinuity);
-       last_ts = ts;
-       while (1) {
-               ts = get_pvrdtscp_timestamp(&discontinuity);
-               if (discontinuity)
-                       printf("migrated/restored, incarn=%d: "
-                               "vtsc now %d, vtsc_khz=%lu, phys cpu_khz=%lu\n",
-                               (unsigned long)get_xen_incarnation(),
-                               (unsigned long)get_xen_vtsc(),
-                               (unsigned long)get_xen_vtsc_khz(),
-                               (unsigned long)get_xen_cpu_khz());
-               if (ts < last_ts)
-                       /* this should NEVER happen, especially since there
-                        * is a check for it in get_pvrdtscp_timestamp() */
-                       printf("Time went backwards: %lluns (%llusec)\n",
-                               last_ts-ts,(last_ts-ts)/NSEC_PER_SEC);
-               if (ts > last_ts + 200000000LL)
-                       /* this is OK, usually about 2sec for save/restore
-                        * and a fraction of a second for live migrate */
-                       printf("Time jumped forward %lluns (%llusec)\n",
-                               ts-last_ts,(ts-last_ts)/NSEC_PER_SEC);
-               last_ts = ts;
-       }
-}
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index c04fd75..c0f88a7 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -309,12 +309,19 @@ int libxl__arch_domain_create(libxl__gc *gc, 
libxl_domain_config *d_config,
         tsc_mode = 2;
         break;
     case LIBXL_TSC_MODE_NATIVE_PARAVIRT:
-        tsc_mode = 3;
-        break;
+        LOGD(ERROR, domid, "TSC Mode native_paravirt (a.k.a PVRDTSCP) has been 
removed");
+        ret = ERROR_FEATURE_REMOVED;
+        goto out;
     default:
         abort();
     }
-    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
+
+    if (xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0)) {
+        LOGE(ERROR, "xc_domain_set_tsc_info() failed");
+        ret = ERROR_FAIL;
+        goto out;
+    }
+
     if (libxl_defbool_val(d_config->b_info.disable_migrate))
         xc_domain_disable_migrate(ctx->xch, domid);
     rtc_timeoffset = d_config->b_info.rtc_timeoffset;
diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index 484b790..cc8175a 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -2439,7 +2439,7 @@ static PyMethodDef pyxc_methods[] = {
       "Set a domain's TSC mode\n"
       " dom        [int]: Domain whose TSC mode is being set.\n"
       " tsc_mode   [int]: 0=default (monotonic, but native where possible)\n"
-      "                   1=always emulate 2=never emulate 3=pvrdtscp\n"
+      "                   1=always emulate 2=never emulate\n"
       "Returns: [int] 0 on success; -1 on error.\n" },
 
     { "domain_disable_migrate",
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.