[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v2 for-4.7] docs: Feature Levelling feature document
Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> Release-acked-by: Wei Liu <wei.liu2@xxxxxxxxxx> --- CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> v2: Squash patch from Ian, improving text relating to `xl` --- docs/features/feature-levelling.pandoc | 216 +++++++++++++++++++++++++++++++++ 1 file changed, 216 insertions(+) create mode 100644 docs/features/feature-levelling.pandoc diff --git a/docs/features/feature-levelling.pandoc b/docs/features/feature-levelling.pandoc new file mode 100644 index 0000000..ef77eb8 --- /dev/null +++ b/docs/features/feature-levelling.pandoc @@ -0,0 +1,216 @@ +% Feature Levelling +% Revision 1 + +\clearpage + +# Basics + +---------------- ---------------------------------------------------- + Status: **Supported** + + Architecture: x86 + + Component: Hypervisor, toolstack, guest +---------------- ---------------------------------------------------- + + +# Overview + +On native hardware, a kernel will boot, detect features, typically optimise +certain codepaths based on the available features, and expect the features to +remain available until it shuts down. + +The same expectation exists for virtual machines, and it is up to the +hypervisor/toolstack to fulfill this expectation for the lifetime of the +virtual machine, including across migrate/suspend/resume. + + +# User details + +Many factors affect the featureset which a VM may use: + +* The CPU itself +* The BIOS/firmware/microcode version and settings +* The hypervisor version and command line settings +* Further restrictions the toolstack chooses to apply + +A firmware or software upgrade might reduce the available set of features +(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell +processors), as may editing the settings. + +It is unsafe to make any assumption about features remaining consistent across +a host reboot. Xen recalculates all information from scratch each boot, and +provides the information for the toolstack to consume. + +`xl` currently has no facilities to help the user collect appropriate feature +information from relevant hosts and compute appropriate feature specifications +for use in host or domain configurations. (`xl` being a single-host +toolstack, it would in any case need external support for accessing remote +hosts eg via ssh, in the form of automation software like GNU parallel or +ansible.) + +# Technical details + +The `CPUID` instruction is used by software to query for features. In the +virtualisation usecase, guest software should query Xen rather than hardware +directly. However, `CPUID` is an unprivileged instruction which doesn't +fault, complicating the task of hiding hardware features from guests. + +Important files: + +* Hypervisor + * `xen/arch/x86/cpu/*.c` + * `xen/arch/x86/cpuid.c` + * `xen/include/asm-x86/cpuid-autogen.h` + * `xen/include/public/arch-x86/cpufeatureset.h` + * `xen/tools/gen-cpuid.py` +* `libxc` + * `tools/libxc/xc_cpuid_x86.c` + +## Ability to control CPUID + +### HVM + +HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen +on all `CPUID` instructions, allowing Xen full control over all information. + +### PV + +The `CPUID` instruction is unprivileged, so executing it in a PV guest will +not trap, leaving Xen no direct ability to control the information returned. + +### Xen Forced Emulation Prefix + +Xen-aware PV software can make use of the 'Forced Emulation Prefix' + +> `ud2a; .ascii 'xen'; cpuid` + +which Xen recognises as a deliberate attempt to get the fully-controlled +`CPUID` information rather than the hardware-reported information. This only +works with cooperative software. + +### Masking and Override MSRs + +AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow +direct control of the values returned for certain `CPUID` leaves. These MSRs +allow any result to be returned, including the ability to advertise features +which are not actually supported. + +Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of +_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID` +instructions requesting specific feature bitmap sets. The exact MSRs, and +which feature bitmap sets they affect are hardware specific. These MSRs allow +features to be hidden by clearing the appropriate bit in the mask, but does +not allow unsupported features to be advertised. + +### CPUID Faulting + +Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to +cause `CPUID` instruction executed in PV guests to fault. This allows Xen +full control over all information, exactly like HVM guests. + +## Compile time + +As some features depend on other features, it is important that, when +disabling a certain feature, we disable all features which depend on it. This +allows runtime logic to be simplified, by being able to rely on testing only +the single appropriate feature, rather than the entire feature dependency +chain. + +To speed up runtime calculation of feature dependencies, the dependency chain +is calculated and flattened by `xen/tools/gen-cpuid.py` to create +`xen/include/asm-x86/cpuid-autogen.h` from +`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to +disable all dependent features of a specific disabled feature in constant +time. + +## Host boot + +As Xen boots, it will enumerate the features it can see. This is stored as +the _raw\_featureset_. + +Errata checks and command line arguments are then taken into account to reduce +the _raw\_featureset_ into the _host\_featureset_, which is the set of +features Xen uses. On hardware with masking/override MSRs, the default MSR +values are picked from the _host\_featureset_. + +The _host\_featureset_ is then used to calculate the _pv\_featureset_ and +_hvm\_featureset_, which are the maximum featuresets Xen is willing to offer +to PV and HVM guests respectively. + +In addition, Xen will calculate how much control it has over non-cooperative +PV `CPUID` instructions, storing this information as _levelling\_caps_. + +## Domain creation + +The toolstack can query each of the calculated featureset via +`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via +`XEN_SYSCTL_get_cpu_levelling_caps`. + +These data should be used by the toolstack when choosing the eventual +featureset to offer to the guest. + +Once a featureset has been chosen, it is set (implicitly or explicitly) via +`XEN_DOMCTL_set_cpuid`. Xen will clamp the toolstacks choice to the +appropriate PV or HVM featureset. On hardware with masking/override MSRs, the +guest cpuid policy is reflected in the MSRs, which are context switched with +other vcpu state. + +# Limitations + +A guest which ignores the provided feature information and manually probes for +features will be able to find some of them. e.g. There is no way of forcibly +preventing a guest from using 1GB superpages if the hardware supports it. + +Some information simply cannot be hidden from guests. There is no way to +control certain behaviour such as the hardware MXCSR\_MASK or x87 FPU exception +behaviour. + + +# Testing + +Feature levelling is a very wide area, and used all over the hypervisor. +Please ask on xen-devel for help identifying more specific tests which could +be of use. + + +# Known issues / Areas for improvement + +The feature querying and levelling functions should exposed in a +convenient-to-use way by `xl`. + +Xen currently has no concept of per-{socket,core,thread} CPUID information. +As a result, details such as APIC IDs, topology and cache information do not +match real hardware, and do not match the documented expectations in the Intel +and AMD system manuals. + +The CPU feature flags are the only information which the toolstack has a +sensible interface for querying and levelling. Other information in the CPUID +policy is important and should be levelled (e.g. maxphysaddr). + +The CPUID policy is currently regenerated from scratch by the receiving side, +once memory and vcpu content has been restored. This means that the receiving +Xen cannot verify the memory/vcpu content against the CPUID policy, and can +end up running a guest which will subsequently crash. The CPUID policy should +be at the head of the migration stream. + +MSRs are another source of features for guests. There is no general provision +for controlling the available MSRs. E.g. 64bit versions of Windows notice +changes in IA32\_MISC\_ENABLE, and suffer a BSOD 0x109 (Critical Structure +Corruption) + + +# References + +[Intel Flexmigration](http://www.intel.co.uk/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf) + +[AMD Extended Migration Technology](http://developer.amd.com/wordpress/media/2012/10/43781-3.00-PUB_Live-Virtual-Machine-Migration-on-AMD-processors.pdf) + + +# History + +------------------------------------------------------------------------ +Date Revision Version Notes +---------- -------- -------- ------------------------------------------- +2016-05-31 1 Xen 4.7 Document written +---------- -------- -------- ------------------------------------------- -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |