[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-changelog] [xen master] docs: convert tscmode.txt into man page
commit dc562334db2b1fc232dda884f84bb0172e1d1480 Author: Cédric Bosdonnat <cbosdonnat@xxxxxxxx> AuthorDate: Fri Dec 9 17:07:31 2016 +0100 Commit: Wei Liu <wei.liu2@xxxxxxxxxx> CommitDate: Mon Jan 9 11:05:47 2017 +0000 docs: convert tscmode.txt into man page tscmode.txt is referenced in xl.cfg(5). Convert it into a pod formatted man page. Signed-off-by: Cédric Bosdonnat <cbosdonnat@xxxxxxxx> Acked-by: Wei Liu <wei.liu2@xxxxxxxxxx> --- docs/INDEX | 1 - docs/man/xen-tscmode.pod.7 | 359 +++++++++++++++++++++++++++++++++++++++++++++ docs/man/xl.cfg.pod.5.in | 4 +- docs/misc/tscmode.txt | 320 ---------------------------------------- 4 files changed, 361 insertions(+), 323 deletions(-) diff --git a/docs/INDEX b/docs/INDEX index 66cc82b..868ab1f 100644 --- a/docs/INDEX +++ b/docs/INDEX @@ -12,7 +12,6 @@ misc/xen-command-line Xen Hypervisor Command Line Options misc/crashdb Xen crash debugger notes misc/grant-tables A Rough Introduction to Using Grant Tables misc/kexec_and_kdump Kexec and Kdump for Xen -misc/tscmode TSC Mode HOWTO misc/xenstore Xenstore protocol specification misc/xenstore-paths Xenstore path documentation misc/distro_mapping Distro Directory Layouts diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7 new file mode 100644 index 0000000..0da57e5 --- /dev/null +++ b/docs/man/xen-tscmode.pod.7 @@ -0,0 +1,359 @@ +=head1 OVERVIEW + +As of Xen 4.0, a new config option called tsc_mode may be specified +for each domain. The default for tsc_mode handles the vast majority +of hardware and software environments. This document is targeted +for Xen users and administrators that may need to select a non-default +tsc_mode. + +Proper selection of tsc_mode depends on an understanding not only of +the guest operating system (OS), but also of the application set that will +ever run on this guest OS. This is because tsc_mode applies +equally to both the OS and ALL apps that are running on this +domain, now or in the future. + +Key questions to be answered for the OS and/or each application are: + +=over 4 + +=item * + +Does the OS/app use the rdtsc instruction at all? +(We will explain below how to determine this.) + +=item * + +At what frequency is the rdtsc instruction executed by either the OS +or any running apps? If the sum exceeds about 10,000 rdtsc instructions +per second per processor, we call this a "high-TSC-frequency" +OS/app/environment. (This is relatively rare, and developers of OS's +and apps that are high-TSC-frequency are usually aware of it.) + +=item * + +If the OS/app does use rdtsc, will it behave incorrectly if "time goes +backwards" or if the frequency of the TSC suddenly changes? If so, +we call this a "TSC-sensitive" app or OS; otherwise it is "TSC-resilient". + +=back + +This last is the US$64,000 question as it may be very difficult +(or, for legacy apps, even impossible) to predict all possible +failure cases. As a result, unless proven otherwise, any app +that uses rdtsc must be assumed to be TSC-sensitive and, as we +will see, this is the default starting in Xen 4.0. + +Xen's new tsc_mode parameter determines the circumstances under which +the family of rdtsc instructions are executed "natively" vs emulated. +Roughly speaking, native means rdtsc is fast but TSC-sensitive apps +may, under unpredictable circumstances, run incorrectly; emulated means +there is some performance degradation (unobservable in most cases), +but TSC-sensitive apps will always run correctly. Prior to Xen 4.0, +all rdtsc instructions were native: "fast but potentially incorrect." +Starting at Xen 4.0, the default is that all rdtsc instructions are +"correct but potentially slow". The tsc_mode parameter in 4.0 provides +an intelligent default but allows system administrator's to adjust +how rdtsc instructions are executed differently for different domains. + +The non-default choices for tsc_mode are: + +=over 4 + +=item * B<tsc_mode=1> (always emulate). + +All rdtsc instructions are emulated; this is the best choice when +TSC-sensitive apps are running and it is necessary to understand +worst-case performance degradation for a specific hardware environment. + +=item * B<tsc_mode=2> (never emulate). + +This is the same as prior to Xen 4.0 and is the best choice if it +is certain that all apps running in this VM are TSC-resilient and +highest performance is required. + +=item * B<tsc_mode=3> (PVRDTSCP). + +High-TSC-frequency apps may be paravirtualized (modified) to +obtain both correctness and highest performance; any unmodified +apps must be TSC-resilient. + +=back + +If tsc_mode is left unspecified (or set to B<tsc_mode=0>), a hybrid +algorithm is utilized to ensure correctness while providing the +best performance possible given: + +=over 4 + +=item * + +the requirement of correctness, + +=item * + +the underlying hardware, and + +=item * + +whether or not the VM has been saved/restored/migrated +To understand this in more detail, the rest of this document must +be read. + +=back + +=head1 DETERMINING RDTSC FREQUENCY + +To determine the frequency of rdtsc instructions that are emulated, +an "xm" command can be used by a privileged user of domain0. The +command: + + # xm debug-key s; xm dmesg | tail + +provides information about TSC usage in each domain where TSC +emulation is currently enabled. + +=head1 TSC HISTORY + +To understand tsc_mode completely, some background on TSC is required: + +The x86 "timestamp counter", or TSC, is a 64-bit register on each +processor that increases monotonically. Historically, TSC incremented +every processor cycle, but on recent processors, it increases +at a constant rate even if the processor changes frequency (for example, +to reduce processor power usage). TSC is known by x86 programmers +as the fastest, highest-precision measurement of the passage of time +so it is often used as a foundation for performance monitoring. +And since it is guaranteed to be monotonically increasing and, at +64 bits, is guaranteed to not wraparound within 10 years, it is +sometimes used as a random number or a unique sequence identifier, +such as to stamp transactions so they can be replayed in a specific +order. + +On most older SMP and early multi-core machines, TSC was not synchronized +between processors. Thus if an application were to read the TSC on +one processor, then was moved by the OS to another processor, then read +TSC again, it might appear that "time went backwards". This loss of +monotonicity resulted in many obscure application bugs when TSC-sensitive +apps were ported from a uniprocessor to an SMP environment; as a result, +many applications -- especially in the Windows world -- removed their +dependency on TSC and replaced their timestamp needs with OS-specific +functions, losing both performance and precision. On some more recent +generations of multi-core machines, especially multi-socket multi-core +machines, the TSC was synchronized but if one processor were to enter +certain low-power states, its TSC would stop, destroying the synchrony +and again causing obscure bugs. This reinforced decisions to avoid use +of TSC altogether. On the most recent generations of multi-core +machines, however, synchronization is provided across all processors +in all power states, even on multi-socket machines, and provide a +flag that indicates that TSC is synchronized and "invariant". Thus +TSC is once again useful for applications, and even newer operating +systems are using and depending upon TSC for critical timekeeping +tasks when running on these recent machines. + +We will refer to hardware that ensures TSC is both synchronized and +invariant as "TSC-safe" and any hardware on which TSC is not (or +may not remain) synchronized as "TSC-unsafe". + +As a result of TSC's sordid history, two classes of applications use +TSC: old applications designed for single processors, and the most recent +enterprise applications which require high-frequency high-precision +timestamping. + +We will refer to apps that might break if running on a TSC-unsafe +machine as "TSC-sensitive"; apps that don't use TSC, or do use +TSC but use it in a way that monotonicity and frequency invariance +are unimportant as "TSC-resilient". + +The emergence of virtualization once again complicates the usage of +TSC. When features such as save/restore or live migration are employed, +a guest OS and all its currently running applications may be invisibly +transported to an entirely different physical machine. While TSC +may be "safe" on one machine, it is essentially impossible to precisely +synchronize TSC across a data center or even a pool of machines. As +a result, when run in a virtualized environment, rare and obscure +"time going backwards" problems might once again occur for those +TSC-sensitive applications. Worse, if a guest OS moves from, for +example, a 3GHz +machine to a 1.5GHz machine, attempts by an OS/app to measure time +intervals with TSC may without notice be incorrect by a factor of two. + +The rdtsc (read timestamp counter) instruction is used to read the +TSC register. The rdtscp instruction is a variant of rdtsc on recent +processors. We refer to these together as the rdtsc family of instructions, +or just "rdtsc". Instructions in the rdtsc family are non-privileged, but +privileged software may set a cpuid bit to cause all rdtsc family +instructions to trap. This trap can be detected by Xen, which can +then transparently "emulate" the results of the rdtsc instruction and +return control to the code following the rdtsc instruction. + +To provide a "safe" TSC, i.e. to ensure both TSC monotonicity and a +fixed rate, Xen provides rdtsc emulation whenever necessary or when +explicitly specified by a per-VM configuration option. TSC emulation is +relatively slow -- roughly 15-20 times slower than the rdtsc instruction +when executed natively. However, except when an OS or application uses +the rdtsc instruction at a high frequency (e.g. more than about 10,000 times +per second per processor), this performance degradation is not noticeable +(i.e. <0.3%). And, TSC emulation is nearly always faster than +OS-provided alternatives (e.g. Linux's gettimeofday). For environments +where it is certain that all apps are TSC-resilient (e.g. +"TSC-safeness" is not necessary) and highest performance is a +requirement, TSC emulation may be entirely disabled (tsc_mode==2). + +The default mode (tsc_mode==0) checks TSC-safeness of the underlying +hardware on which the virtual machine is launched. If it is +TSC-safe, rdtsc will execute at hardware speed; if it is not, rdtsc +will be emulated. Once a virtual machine is save/restored or migrated, +however, there are two possibilities: For a paravirtualized (PV) domain, +TSC will always be emulated. For a fully-virtualized (HVM) domain, +TSC remains native IF the source physical machine and target physical machine +have the same TSC frequency; else TSC is emulated. Note that, though +emulated, the "apparent" TSC frequency will be the TSC frequency +of the initial physical machine, even after migration. + +For environments where both TSC-safeness AND highest performance +even across migration is a requirement, application code can be specially +modified to use an algorithm explicitly designed into Xen for this purpose. +This mode (tsc_mode==3) is called PVRDTSCP, because it requires +app paravirtualization (awareness by the app that it may be running +on top of Xen), and utilizes a variation of the rdtsc instruction +called rdtscp that is available on most recent generation processors. +(The rdtscp instruction differs from the rdtsc instruction in that it +reads not only the TSC but an additional register set by system software.) +When a pvrdtscp-modified app is running on a processor that is both TSC-safe +and supports the rdtscp instruction, information can be obtained +about migration and TSC frequency/offset adjustment to allow the +vast majority of timestamps to be obtained at top performance; when +running on a TSC-unsafe processor or a processor that doesn't support +the rdtscp instruction, rdtscp is emulated. + +PVRDTSCP (tsc_mode==3) has two limitations. First, it applies to +all apps running in this virtual machine. This means that all +apps must either be TSC-resilient or pvrdtscp-modified. Second, +highest performance is only obtained on TSC-safe machines that +support the rdtscp instruction; when running on older machines, +rdtscp is emulated and thus slower. For more information on PVRDTSCP, +see below. + +Finally, tsc_mode==1 always enables TSC emulation, regardless of +the underlying physical hardware. The "apparent" TSC frequency will +be the TSC frequency of the initial physical machine, even after migration. +This mode is useful to measure any performance degradation that +might be encountered by a tsc_mode==0 domain after migration occurs, +or a tsc_mode==3 domain when it is running on TSC-unsafe hardware. + +Note that while Xen ensures that an emulated TSC is "safe" across migration, +it does not ensure that it continues to tick at the same rate during +the actual migration. As an oversimplified example, if TSC is ticking +once per second in a guest, and the guest is saved when the TSC is 1000, +then restored 30 seconds later, TSC is only guaranteed to be greater +than or equal to 1001, not precisely 1030. This has some OS implications +as will be seen in the next section. + +=head1 TSC INVARIANT BIT and NO_MIGRATE + +Related to TSC emulation, the "TSC Invariant" bit is architecturally defined +in a cpuid bit on the most recent x86 processors. If set, TSC invariance +ensures that the TSC is "safe", that is it will increment at a constant rate +regardless of power events, will be synchronized across all processors, and +was properly initialized to zero on all processors at boot-time +by system hardware/BIOS. As long as system software never writes to TSC, +TSC will be safe and continuously incremented at a fixed rate and thus +can be used as a system "clocksource". + +This bit is used by some OS's, and specifically by Linux starting with +version 2.6.30(?), to select TSC as a system clocksource. Once selected, +TSC remains the Linux system clocksource unless manually overridden. In +a virtualized environment, since it is not possible to synchronize TSC +across all the machines in a pool or data center, a migration may "break" +TSC as a usable clocksource; while time will not go backwards, it may +not track wallclock time well enough to avoid certain time-sensitive +consequences. As a result, Xen can only expose the TSC Invariant bit +to a guest OS if it is certain that the domain will never migrate. +As of Xen 4.0, the "no_migrate=1" VM configuration option may be specified +to disable migration. If no_migrate is selected and the VM is running +on a physical machine with "TSC Invariant", Linux 2.6.30+ will safely +use TSC as the system clocksource. But, attempts to migrate or, once +saved, restore this domain will fail. + +There is another cpuid-related complication: The x86 cpuid instruction is +non-privileged. HVM domains are configured to always trap this instruction +to Xen, where Xen can "filter" the result. In a PV OS, all cpuid instructions +have been replaced by a paravirtualized equivalent of the cpuid instruction +("pvcpuid") and also trap to Xen. But apps in a PV guest that use a +cpuid instruction execute it directly, without a trap to Xen. As a result, +an app may directly examine the physical TSC Invariant cpuid bit and make +decisions based on that bit. This is still an unsolved problem, though +a workaround exists as part of the PVRDTSCP tsc_mode for apps that +can be modified. + +=head1 MORE ON PVRDTSCP + +Paravirtualized OS's use the "pvclock" algorithm to manage the passing +of time. This sophisticated algorithm obtains information from a memory +page shared between Xen and the OS and selects information from this +page based on the current virtual CPU (vcpu) in order to properly adapt to +TSC-unsafe systems and changes that occur across migration. Neither +this shared page nor the vcpu information is available to a userland +app so the pvclock algorithm cannot be directly used by an app, at least +without performance degradation roughly equal to the cost of just +emulating an rdtsc. + +As a result, as of 4.0, Xen provides capabilities for a userland app +to obtain key time values similar to the information accessible +to the PV OS pvclock algorithm. The app uses the rdtscp instruction +which is defined in recent processors to obtain both the TSC and an +auxiliary value called TSC_AUX. Xen is responsible for setting TSC_AUX +to the same value on all vcpus running any domain with tsc_mode==3; +further, Xen tools are responsible for monotonically incrementing TSC_AUX +anytime the domain is restored/migrated (thus changing key time values); +and, when the domain is running on a physical machine that either +is not TSC-safe or does not support the rdtscp instruction, Xen +is responsible for emulating the rdtscp instruction and for setting +TSC_AUX to zero on all processors. + +Xen also provides pvclock information via a "pvcpuid" instruction. +While this results in a slow trap, the information changes +(and thus must be reobtained via pvcpuid) ONLY when TSC_AUX +has changed, which should be very rare relative to a high +frequency of rdtscp instructions. + +Finally, Xen provides additional time-related information via +other pvcpuid instructions. First, an app is capable of +determining if it is currently running on Xen, next whether +the tsc_mode setting of the domain in which it is running, +and finally whether the underlying hardware is TSC-safe and +supports the rdtscp instruction. + +As a result, a pvrdtscp-modified app has sufficient information +to compute the pvclock "elapsed nanoseconds" which can +be used as a timestamp. And this can be done nearly as +fast as a native rdtsc instruction, much faster than emulation, +and also much faster than nearly all OS-provided time mechanisms. +While pvrtscp is too complex for most apps, certain enterprise +TSC-sensitive high-TSC-frequency apps may find it useful to +obtain a significant performance gain. + +=head1 HARDWARE TSC SCALING + +Intel VMX TSC scaling and AMD SVM TSC ratio allow the guest TSC read +by guest rdtsc/p increasing in a different frequency than the host +TSC frequency. + +If a HVM container in default TSC mode (tsc_mode=0) or PVRDTSCP mode +(tsc_mode=3) is created on a host that provides constant TSC, its +guest TSC frequency will be the same as the host. If it is later +migrated to another host that provides constant TSC and supports Intel +VMX TSC scaling/AMD SVM TSC ratio, its guest TSC frequency will be the +same before and after migration. + +For above HVM container in default TSC mode (tsc_mode=0), if above +hosts support rdtscp, both guest rdtsc and rdtscp instructions will be +executed natively before and after migration. + +For above HVM container in PVRDTSCP mode (tsc_mode=3), if the +destination host does not support rdtscp, the guest rdtscp instruction +will be emulated with the guest TSC frequency. + +=head1 AUTHORS + +Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in index 64dd5c7..46f9caf 100644 --- a/docs/man/xl.cfg.pod.5.in +++ b/docs/man/xl.cfg.pod.5.in @@ -1463,7 +1463,7 @@ determined in the similar way to that of B<default> TSC mode. =back -Please see F<docs/misc/tscmode.txt> for more information on this option. +Please see L<xen-tscmode(7)> for more information on this option. =item B<localtime=BOOLEAN> @@ -2033,7 +2033,7 @@ natively or via hardware backwards compatibility support. =item L<xl-network-configuration(5)> -=item F<docs/misc/tscmode.txt> +=item L<xen-tscmode(7)> =back diff --git a/docs/misc/tscmode.txt b/docs/misc/tscmode.txt deleted file mode 100644 index 01ee060..0000000 --- a/docs/misc/tscmode.txt +++ /dev/null @@ -1,320 +0,0 @@ -TSC_MODE HOW-TO -by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> - -OVERVIEW - -As of Xen 4.0, a new config option called tsc_mode may be specified -for each domain. The default for tsc_mode handles the vast majority -of hardware and software environments. This document is targeted -for Xen users and administrators that may need to select a non-default -tsc_mode. - -Proper selection of tsc_mode depends on an understanding not only of -the guest operating system (OS), but also of the application set that will -ever run on this guest OS. This is because tsc_mode applies -equally to both the OS and ALL apps that are running on this -domain, now or in the future. - -Key questions to be answered for the OS and/or each application are: -- Does the OS/app use the rdtsc instruction at all? (We will explain below - how to determine this.) -- At what frequency is the rdtsc instruction executed by either the OS - or any running apps? If the sum exceeds about 10,000 rdtsc instructions - per second per processor, we call this a "high-TSC-frequency" - OS/app/environment. (This is relatively rare, and developers of OS's - and apps that are high-TSC-frequency are usually aware of it.) -- If the OS/app does use rdtsc, will it behave incorrectly if "time goes - backwards" or if the frequency of the TSC suddenly changes? If so, - we call this a "TSC-sensitive" app or OS; otherwise it is "TSC-resilient". - -This last is the US$64,000 question as it may be very difficult -(or, for legacy apps, even impossible) to predict all possible -failure cases. As a result, unless proven otherwise, any app -that uses rdtsc must be assumed to be TSC-sensitive and, as we -will see, this is the default starting in Xen 4.0. - -Xen's new tsc_mode parameter determines the circumstances under which -the family of rdtsc instructions are executed "natively" vs emulated. -Roughly speaking, native means rdtsc is fast but TSC-sensitive apps -may, under unpredictable circumstances, run incorrectly; emulated means -there is some performance degradation (unobservable in most cases), -but TSC-sensitive apps will always run correctly. Prior to Xen 4.0, -all rdtsc instructions were native: "fast but potentially incorrect." -Starting at Xen 4.0, the default is that all rdtsc instructions are -"correct but potentially slow". The tsc_mode parameter in 4.0 provides -an intelligent default but allows system administrator's to adjust -how rdtsc instructions are executed differently for different domains. - -The non-default choices for tsc_mode are: -- tsc_mode=1 (always emulate). All rdtsc instructions are emulated; - this is the best choice when TSC-sensitive apps are running and - it is necessary to understand worst-case performance degradation - for a specific hardware environment. -- tsc_mode=2 (never emulate). This is the same as prior to Xen 4.0 - and is the best choice if it is certain that all apps running in - this VM are TSC-resilient and highest performance is required. -- tsc_mode=3 (PVRDTSCP). High-TSC-frequency apps may be paravirtualized - (modified) to obtain both correctness and highest performance; any - unmodified apps must be TSC-resilient. - -If tsc_mode is left unspecified (or set to tsc_mode=0), a hybrid -algorithm is utilized to ensure correctness while providing the -best performance possible given: -- the requirement of correctness, -- the underlying hardware, and -- whether or not the VM has been saved/restored/migrated -To understand this in more detail, the rest of this document must -be read. - -DETERMINING RDTSC FREQUENCY - -To determine the frequency of rdtsc instructions that are emulated, -an "xm" command can be used by a privileged user of domain0. The -command: - -# xm debug-key s; xm dmesg | tail - -provides information about TSC usage in each domain where TSC -emulation is currently enabled. - -TSC HISTORY - -To understand tsc_mode completely, some background on TSC is required: - -The x86 "timestamp counter", or TSC, is a 64-bit register on each -processor that increases monotonically. Historically, TSC incremented -every processor cycle, but on recent processors, it increases -at a constant rate even if the processor changes frequency (for example, -to reduce processor power usage). TSC is known by x86 programmers -as the fastest, highest-precision measurement of the passage of time -so it is often used as a foundation for performance monitoring. -And since it is guaranteed to be monotonically increasing and, at -64 bits, is guaranteed to not wraparound within 10 years, it is -sometimes used as a random number or a unique sequence identifier, -such as to stamp transactions so they can be replayed in a specific -order. - -On most older SMP and early multi-core machines, TSC was not synchronized -between processors. Thus if an application were to read the TSC on -one processor, then was moved by the OS to another processor, then read -TSC again, it might appear that "time went backwards". This loss of -monotonicity resulted in many obscure application bugs when TSC-sensitive -apps were ported from a uniprocessor to an SMP environment; as a result, -many applications -- especially in the Windows world -- removed their -dependency on TSC and replaced their timestamp needs with OS-specific -functions, losing both performance and precision. On some more recent -generations of multi-core machines, especially multi-socket multi-core -machines, the TSC was synchronized but if one processor were to enter -certain low-power states, its TSC would stop, destroying the synchrony -and again causing obscure bugs. This reinforced decisions to avoid use -of TSC altogether. On the most recent generations of multi-core -machines, however, synchronization is provided across all processors -in all power states, even on multi-socket machines, and provide a -flag that indicates that TSC is synchronized and "invariant". Thus -TSC is once again useful for applications, and even newer operating -systems are using and depending upon TSC for critical timekeeping -tasks when running on these recent machines. - -We will refer to hardware that ensures TSC is both synchronized and -invariant as "TSC-safe" and any hardware on which TSC is not (or -may not remain) synchronized as "TSC-unsafe". - -As a result of TSC's sordid history, two classes of applications use -TSC: old applications designed for single processors, and the most recent -enterprise applications which require high-frequency high-precision -timestamping. - -We will refer to apps that might break if running on a TSC-unsafe -machine as "TSC-sensitive"; apps that don't use TSC, or do use -TSC but use it in a way that monotonicity and frequency invariance -are unimportant as "TSC-resilient". - -The emergence of virtualization once again complicates the usage of -TSC. When features such as save/restore or live migration are employed, -a guest OS and all its currently running applications may be invisibly -transported to an entirely different physical machine. While TSC -may be "safe" on one machine, it is essentially impossible to precisely -synchronize TSC across a data center or even a pool of machines. As -a result, when run in a virtualized environment, rare and obscure -"time going backwards" problems might once again occur for those -TSC-sensitive applications. Worse, if a guest OS moves from, for -example, a 3GHz -machine to a 1.5GHz machine, attempts by an OS/app to measure time -intervals with TSC may without notice be incorrect by a factor of two. - -The rdtsc (read timestamp counter) instruction is used to read the -TSC register. The rdtscp instruction is a variant of rdtsc on recent -processors. We refer to these together as the rdtsc family of instructions, -or just "rdtsc". Instructions in the rdtsc family are non-privileged, but -privileged software may set a cpuid bit to cause all rdtsc family -instructions to trap. This trap can be detected by Xen, which can -then transparently "emulate" the results of the rdtsc instruction and -return control to the code following the rdtsc instruction. - -To provide a "safe" TSC, i.e. to ensure both TSC monotonicity and a -fixed rate, Xen provides rdtsc emulation whenever necessary or when -explicitly specified by a per-VM configuration option. TSC emulation is -relatively slow -- roughly 15-20 times slower than the rdtsc instruction -when executed natively. However, except when an OS or application uses -the rdtsc instruction at a high frequency (e.g. more than about 10,000 times -per second per processor), this performance degradation is not noticeable -(i.e. <0.3%). And, TSC emulation is nearly always faster than -OS-provided alternatives (e.g. Linux's gettimeofday). For environments -where it is certain that all apps are TSC-resilient (e.g. -"TSC-safeness" is not necessary) and highest performance is a -requirement, TSC emulation may be entirely disabled (tsc_mode==2). - -The default mode (tsc_mode==0) checks TSC-safeness of the underlying -hardware on which the virtual machine is launched. If it is -TSC-safe, rdtsc will execute at hardware speed; if it is not, rdtsc -will be emulated. Once a virtual machine is save/restored or migrated, -however, there are two possibilities: For a paravirtualized (PV) domain, -TSC will always be emulated. For a fully-virtualized (HVM) domain, -TSC remains native IF the source physical machine and target physical machine -have the same TSC frequency; else TSC is emulated. Note that, though -emulated, the "apparent" TSC frequency will be the TSC frequency -of the initial physical machine, even after migration. - -For environments where both TSC-safeness AND highest performance -even across migration is a requirement, application code can be specially -modified to use an algorithm explicitly designed into Xen for this purpose. -This mode (tsc_mode==3) is called PVRDTSCP, because it requires -app paravirtualization (awareness by the app that it may be running -on top of Xen), and utilizes a variation of the rdtsc instruction -called rdtscp that is available on most recent generation processors. -(The rdtscp instruction differs from the rdtsc instruction in that it -reads not only the TSC but an additional register set by system software.) -When a pvrdtscp-modified app is running on a processor that is both TSC-safe -and supports the rdtscp instruction, information can be obtained -about migration and TSC frequency/offset adjustment to allow the -vast majority of timestamps to be obtained at top performance; when -running on a TSC-unsafe processor or a processor that doesn't support -the rdtscp instruction, rdtscp is emulated. - -PVRDTSCP (tsc_mode==3) has two limitations. First, it applies to -all apps running in this virtual machine. This means that all -apps must either be TSC-resilient or pvrdtscp-modified. Second, -highest performance is only obtained on TSC-safe machines that -support the rdtscp instruction; when running on older machines, -rdtscp is emulated and thus slower. For more information on PVRDTSCP, -see below. - -Finally, tsc_mode==1 always enables TSC emulation, regardless of -the underlying physical hardware. The "apparent" TSC frequency will -be the TSC frequency of the initial physical machine, even after migration. -This mode is useful to measure any performance degradation that -might be encountered by a tsc_mode==0 domain after migration occurs, -or a tsc_mode==3 domain when it is running on TSC-unsafe hardware. - -Note that while Xen ensures that an emulated TSC is "safe" across migration, -it does not ensure that it continues to tick at the same rate during -the actual migration. As an oversimplified example, if TSC is ticking -once per second in a guest, and the guest is saved when the TSC is 1000, -then restored 30 seconds later, TSC is only guaranteed to be greater -than or equal to 1001, not precisely 1030. This has some OS implications -as will be seen in the next section. - -TSC INVARIANT BIT and NO_MIGRATE - -Related to TSC emulation, the "TSC Invariant" bit is architecturally defined -in a cpuid bit on the most recent x86 processors. If set, TSC invariance -ensures that the TSC is "safe", that is it will increment at a constant rate -regardless of power events, will be synchronized across all processors, and -was properly initialized to zero on all processors at boot-time -by system hardware/BIOS. As long as system software never writes to TSC, -TSC will be safe and continuously incremented at a fixed rate and thus -can be used as a system "clocksource". - -This bit is used by some OS's, and specifically by Linux starting with -version 2.6.30(?), to select TSC as a system clocksource. Once selected, -TSC remains the Linux system clocksource unless manually overridden. In -a virtualized environment, since it is not possible to synchronize TSC -across all the machines in a pool or data center, a migration may "break" -TSC as a usable clocksource; while time will not go backwards, it may -not track wallclock time well enough to avoid certain time-sensitive -consequences. As a result, Xen can only expose the TSC Invariant bit -to a guest OS if it is certain that the domain will never migrate. -As of Xen 4.0, the "no_migrate=1" VM configuration option may be specified -to disable migration. If no_migrate is selected and the VM is running -on a physical machine with "TSC Invariant", Linux 2.6.30+ will safely -use TSC as the system clocksource. But, attempts to migrate or, once -saved, restore this domain will fail. - -There is another cpuid-related complication: The x86 cpuid instruction is -non-privileged. HVM domains are configured to always trap this instruction -to Xen, where Xen can "filter" the result. In a PV OS, all cpuid instructions -have been replaced by a paravirtualized equivalent of the cpuid instruction -("pvcpuid") and also trap to Xen. But apps in a PV guest that use a -cpuid instruction execute it directly, without a trap to Xen. As a result, -an app may directly examine the physical TSC Invariant cpuid bit and make -decisions based on that bit. This is still an unsolved problem, though -a workaround exists as part of the PVRDTSCP tsc_mode for apps that -can be modified. - -MORE ON PVRDTSCP - -Paravirtualized OS's use the "pvclock" algorithm to manage the passing -of time. This sophisticated algorithm obtains information from a memory -page shared between Xen and the OS and selects information from this -page based on the current virtual CPU (vcpu) in order to properly adapt to -TSC-unsafe systems and changes that occur across migration. Neither -this shared page nor the vcpu information is available to a userland -app so the pvclock algorithm cannot be directly used by an app, at least -without performance degradation roughly equal to the cost of just -emulating an rdtsc. - -As a result, as of 4.0, Xen provides capabilities for a userland app -to obtain key time values similar to the information accessible -to the PV OS pvclock algorithm. The app uses the rdtscp instruction -which is defined in recent processors to obtain both the TSC and an -auxiliary value called TSC_AUX. Xen is responsible for setting TSC_AUX -to the same value on all vcpus running any domain with tsc_mode==3; -further, Xen tools are responsible for monotonically incrementing TSC_AUX -anytime the domain is restored/migrated (thus changing key time values); -and, when the domain is running on a physical machine that either -is not TSC-safe or does not support the rdtscp instruction, Xen -is responsible for emulating the rdtscp instruction and for setting -TSC_AUX to zero on all processors. - -Xen also provides pvclock information via a "pvcpuid" instruction. -While this results in a slow trap, the information changes -(and thus must be reobtained via pvcpuid) ONLY when TSC_AUX -has changed, which should be very rare relative to a high -frequency of rdtscp instructions. - -Finally, Xen provides additional time-related information via -other pvcpuid instructions. First, an app is capable of -determining if it is currently running on Xen, next whether -the tsc_mode setting of the domain in which it is running, -and finally whether the underlying hardware is TSC-safe and -supports the rdtscp instruction. - -As a result, a pvrdtscp-modified app has sufficient information -to compute the pvclock "elapsed nanoseconds" which can -be used as a timestamp. And this can be done nearly as -fast as a native rdtsc instruction, much faster than emulation, -and also much faster than nearly all OS-provided time mechanisms. -While pvrtscp is too complex for most apps, certain enterprise -TSC-sensitive high-TSC-frequency apps may find it useful to -obtain a significant performance gain. - -Hardware TSC Scaling - -Intel VMX TSC scaling and AMD SVM TSC ratio allow the guest TSC read -by guest rdtsc/p increasing in a different frequency than the host -TSC frequency. - -If a HVM container in default TSC mode (tsc_mode=0) or PVRDTSCP mode -(tsc_mode=3) is created on a host that provides constant TSC, its -guest TSC frequency will be the same as the host. If it is later -migrated to another host that provides constant TSC and supports Intel -VMX TSC scaling/AMD SVM TSC ratio, its guest TSC frequency will be the -same before and after migration. - -For above HVM container in default TSC mode (tsc_mode=0), if above -hosts support rdtscp, both guest rdtsc and rdtscp instructions will be -executed natively before and after migration. - -For above HVM container in PVRDTSCP mode (tsc_mode=3), if the -destination host does not support rdtscp, the guest rdtscp instruction -will be emulated with the guest TSC frequency. -- generated by git-patchbot for /home/xen/git/xen.git#master _______________________________________________ Xen-changelog mailing list Xen-changelog@xxxxxxxxxxxxx https://lists.xenproject.org/xen-changelog
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |