[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH for-4.7] docs: Feature Levelling feature document

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
CC: Wei Liu <wei.liu2@xxxxxxxxxx>
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
 docs/features/feature-levelling.pandoc | 211 +++++++++++++++++++++++++++++++++
 1 file changed, 211 insertions(+)
 create mode 100644 docs/features/feature-levelling.pandoc

diff --git a/docs/features/feature-levelling.pandoc 
new file mode 100644
index 0000000..50bf099
--- /dev/null
+++ b/docs/features/feature-levelling.pandoc
@@ -0,0 +1,211 @@
+% Feature Levelling
+% Draft 1
+# Basics
+---------------- ----------------------------------------------------
+         Status: **Supported**
+   Architecture: x86
+      Component: Hypervisor, toolstack, guest
+---------------- ----------------------------------------------------
+# Overview
+On native hardware, a kernel will boot, detect features, typically optimise
+certain codepaths based on the available features, and expect the features to
+remain available until it shuts down.
+The same expectation exists for virtual machines, and it is up to the
+hypervisor/toolstack to fulfil this expectation for the lifetime of the
+virtual machine, including across migrate/suspend/resume.
+# User details
+Many factors affect the featureset which a VM may use:
+* The CPU itself
+* The BIOS/firmware/microcode version and settings
+* The hypervisor version and command line settings
+* Further restrictions the toolstack chooses to apply
+A firmware or software upgrade might reduce the available set of features
+(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell
+processors), as may editing the settings.
+It is unsafe to make any assumption about features remaining consistent across
+a host reboot.  Xen recalculates all information from scratch each boot, and
+provides the information for the toolstack to consume.
+N.B. `xl`, being inherently a single-host toolstack, doesn't make use of these
+levelling improvements.  These features are of interest to higher level
+toolstacks such as `libvirt` or `XAPI`.
+# Technical details
+The `CPUID` instruction is used by software to query for features.  In the
+virtualisation usecase, guest software should query Xen rather than hardware
+directly.  However, `CPUID` is an unprivileged instruction which doesn't
+fault, complicating the task of hiding hardware features from guests.
+Important files:
+* Hypervisor
+    * `xen/arch/x86/cpu/*.c`
+    * `xen/arch/x86/cpuid.c`
+    * `xen/include/asm-x86/cpuid-autogen.h`
+    * `xen/include/public/arch-x86/cpufeatureset.h`
+    * `xen/tools/gen-cpuid.py`
+* `libxc`
+    * `tools/libxc/xc_cpuid_x86.c`
+## Ability to control CPUID
+### HVM
+HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen
+on all `CPUID` instructions, allowing Xen full control over all information.
+### PV
+The `CPUID` instruction is unprivileged, so executing it in a PV guest will
+not trap, leaving Xen no direct ability to control the information returned.
+### Xen Forced Emulation Prefix
+Xen-aware PV software can make use of the 'Forced Emulation Prefix'
+> `ud2a; .ascii 'xen'; cpuid`
+which Xen recognises as a deliberate attempt to get the fully-controlled
+`CPUID` information rather than the hardware-reported information.  This only
+works with cooperative software.
+### Masking and Override MSRs
+AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow
+direct control of the values returned for certain `CPUID` leaves.  These MSRs
+allow any result to be returned, including the ability to advertise features
+which are not actually supported.
+Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of
+_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID`
+instructions requesting specific feature bitmap sets.  The exact MSRs, and
+which feature bitmap sets they affect are hardware specific.  These MSRs allow
+features to be hidden by clearing the appropriate bit in the mask, but does
+not allow unsupported features to be advertised.
+### CPUID Faulting
+Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to
+cause `CPUID` instruction executed in PV guests to fault.  This allows Xen
+full control over all information, exactly like HVM guests.
+## Compile time
+As some features depend on other features, it is important that, when
+disabling a certain feature, we disable all features which depend on it.  This
+allows runtime logic to be simplified, by being able to rely on testing only
+the single appropriate feature, rather than the entire feature dependency
+To speed up runtime calculation of feature dependencies, the dependency chain
+is calculated and flattened by `xen/tools/gen-cpuid.py` to create
+`xen/include/asm-x86/cpuid-autogen.h` from
+`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to
+disable all dependent features of a specific disabled feature in constant
+## Host boot
+As Xen boots, it will enumerate the features it can see.  This is stored as
+the _raw\_featureset_.
+Errata checks and command line arguments are then taken into account to reduce
+the _raw\_featureset_ into the _host\_featureset_, which is the set of
+features Xen uses.  On hardware with masking/override MSRs, the default MSR
+values are picked from the _host\_featureset_.
+The _host\_featureset_ is then used to calculate the _pv\_featureset_ and
+_hvm\_featureset_, which are the maximum featuresets Xen is willing to offer
+to PV and HVM guests respectively.
+In addition, Xen will calculate how much control it has over non-cooperative
+PV `CPUID` instructions, storing this information as _levelling\_caps_.
+## Domain creation
+The toolstack can query each of the calculated featureset via
+`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via
+These data should be used by the toolstack when choosing the eventual
+featureset to offer to the guest.
+Once a featureset has been chosen, it is set (implicitly or explicitly) via
+`XEN_DOMCTL_set_cpuid`.  Xen will clamp the toolstacks choice to the
+appropriate PV or HVM featureset.  On hardware with masking/override MSRs, the
+guest cpuid policy is reflected in the MSRs, which are context switched with
+other vcpu state.
+# Limitations
+A guest which ignores the provided feature information and manually probes for
+features will be able to find some of them.  e.g. There is no way of forcibly
+preventing a guest from using 1GB superpages if the hardware supports it.
+Some information simply cannot be hidden from guests.  There is no way to
+control certain behaviour such as the hardware MXCSR\_MASK or x87 FPU exception
+# Testing
+Feature levelling is a very wide area, and used all over the hypervisor.
+Please ask on xen-devel for help identifying more specific tests which could
+be of use.
+# Known issues / Areas for improvement
+Xen currently has no concept of per-{socket,core,thread} CPUID information.
+As a result, details such as APIC IDs, topology and cache information do not
+match real hardware, and do not match the documented expectations in the Intel
+and AMD system manuals.
+The CPU feature flags are the only information which the toolstack has a
+sensible interface for querying and levelling.  Other information in the CPUID
+policy is important and should be levelled (e.g. maxphysaddr).
+The CPUID policy is currently regenerated from scratch by the receiving side,
+once memory and vcpu content has been restored.  This means that the receiving
+Xen cannot verify the memory/vcpu content against the CPUID policy, and can
+end up running a guest which will subsequently crash.  The CPUID policy should
+be at the head of the migration stream.
+MSRs are another source of features for guests.  There is no general provision
+for controlling the available MSRs.  E.g. 64bit versions of Windows notice
+changes in IA32\_MISC\_ENABLE, and suffer a BSOD 0x109 (Critical Structure
+# References
+[AMD Extended Migration 
+# History
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2016-05-31 1        Xen 4.7  Document written
+---------- -------- -------- -------------------------------------------

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.