[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v2 for-4.7] docs: Feature Levelling feature document



Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
Release-acked-by: Wei Liu <wei.liu2@xxxxxxxxxx>
---
CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

v2: Squash patch from Ian, improving text relating to `xl`
---
 docs/features/feature-levelling.pandoc | 216 +++++++++++++++++++++++++++++++++
 1 file changed, 216 insertions(+)
 create mode 100644 docs/features/feature-levelling.pandoc

diff --git a/docs/features/feature-levelling.pandoc 
b/docs/features/feature-levelling.pandoc
new file mode 100644
index 0000000..ef77eb8
--- /dev/null
+++ b/docs/features/feature-levelling.pandoc
@@ -0,0 +1,216 @@
+% Feature Levelling
+% Revision 1
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+         Status: **Supported**
+
+   Architecture: x86
+
+      Component: Hypervisor, toolstack, guest
+---------------- ----------------------------------------------------
+
+
+# Overview
+
+On native hardware, a kernel will boot, detect features, typically optimise
+certain codepaths based on the available features, and expect the features to
+remain available until it shuts down.
+
+The same expectation exists for virtual machines, and it is up to the
+hypervisor/toolstack to fulfill this expectation for the lifetime of the
+virtual machine, including across migrate/suspend/resume.
+
+
+# User details
+
+Many factors affect the featureset which a VM may use:
+
+* The CPU itself
+* The BIOS/firmware/microcode version and settings
+* The hypervisor version and command line settings
+* Further restrictions the toolstack chooses to apply
+
+A firmware or software upgrade might reduce the available set of features
+(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell
+processors), as may editing the settings.
+
+It is unsafe to make any assumption about features remaining consistent across
+a host reboot.  Xen recalculates all information from scratch each boot, and
+provides the information for the toolstack to consume.
+
+`xl` currently has no facilities to help the user collect appropriate feature
+information from relevant hosts and compute appropriate feature specifications
+for use in host or domain configurations.  (`xl` being a single-host
+toolstack, it would in any case need external support for accessing remote
+hosts eg via ssh, in the form of automation software like GNU parallel or
+ansible.)
+
+# Technical details
+
+The `CPUID` instruction is used by software to query for features.  In the
+virtualisation usecase, guest software should query Xen rather than hardware
+directly.  However, `CPUID` is an unprivileged instruction which doesn't
+fault, complicating the task of hiding hardware features from guests.
+
+Important files:
+
+* Hypervisor
+    * `xen/arch/x86/cpu/*.c`
+    * `xen/arch/x86/cpuid.c`
+    * `xen/include/asm-x86/cpuid-autogen.h`
+    * `xen/include/public/arch-x86/cpufeatureset.h`
+    * `xen/tools/gen-cpuid.py`
+* `libxc`
+    * `tools/libxc/xc_cpuid_x86.c`
+
+## Ability to control CPUID
+
+### HVM
+
+HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen
+on all `CPUID` instructions, allowing Xen full control over all information.
+
+### PV
+
+The `CPUID` instruction is unprivileged, so executing it in a PV guest will
+not trap, leaving Xen no direct ability to control the information returned.
+
+### Xen Forced Emulation Prefix
+
+Xen-aware PV software can make use of the 'Forced Emulation Prefix'
+
+> `ud2a; .ascii 'xen'; cpuid`
+
+which Xen recognises as a deliberate attempt to get the fully-controlled
+`CPUID` information rather than the hardware-reported information.  This only
+works with cooperative software.
+
+### Masking and Override MSRs
+
+AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow
+direct control of the values returned for certain `CPUID` leaves.  These MSRs
+allow any result to be returned, including the ability to advertise features
+which are not actually supported.
+
+Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of
+_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID`
+instructions requesting specific feature bitmap sets.  The exact MSRs, and
+which feature bitmap sets they affect are hardware specific.  These MSRs allow
+features to be hidden by clearing the appropriate bit in the mask, but does
+not allow unsupported features to be advertised.
+
+### CPUID Faulting
+
+Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to
+cause `CPUID` instruction executed in PV guests to fault.  This allows Xen
+full control over all information, exactly like HVM guests.
+
+## Compile time
+
+As some features depend on other features, it is important that, when
+disabling a certain feature, we disable all features which depend on it.  This
+allows runtime logic to be simplified, by being able to rely on testing only
+the single appropriate feature, rather than the entire feature dependency
+chain.
+
+To speed up runtime calculation of feature dependencies, the dependency chain
+is calculated and flattened by `xen/tools/gen-cpuid.py` to create
+`xen/include/asm-x86/cpuid-autogen.h` from
+`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to
+disable all dependent features of a specific disabled feature in constant
+time.
+
+## Host boot
+
+As Xen boots, it will enumerate the features it can see.  This is stored as
+the _raw\_featureset_.
+
+Errata checks and command line arguments are then taken into account to reduce
+the _raw\_featureset_ into the _host\_featureset_, which is the set of
+features Xen uses.  On hardware with masking/override MSRs, the default MSR
+values are picked from the _host\_featureset_.
+
+The _host\_featureset_ is then used to calculate the _pv\_featureset_ and
+_hvm\_featureset_, which are the maximum featuresets Xen is willing to offer
+to PV and HVM guests respectively.
+
+In addition, Xen will calculate how much control it has over non-cooperative
+PV `CPUID` instructions, storing this information as _levelling\_caps_.
+
+## Domain creation
+
+The toolstack can query each of the calculated featureset via
+`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via
+`XEN_SYSCTL_get_cpu_levelling_caps`.
+
+These data should be used by the toolstack when choosing the eventual
+featureset to offer to the guest.
+
+Once a featureset has been chosen, it is set (implicitly or explicitly) via
+`XEN_DOMCTL_set_cpuid`.  Xen will clamp the toolstacks choice to the
+appropriate PV or HVM featureset.  On hardware with masking/override MSRs, the
+guest cpuid policy is reflected in the MSRs, which are context switched with
+other vcpu state.
+
+# Limitations
+
+A guest which ignores the provided feature information and manually probes for
+features will be able to find some of them.  e.g. There is no way of forcibly
+preventing a guest from using 1GB superpages if the hardware supports it.
+
+Some information simply cannot be hidden from guests.  There is no way to
+control certain behaviour such as the hardware MXCSR\_MASK or x87 FPU exception
+behaviour.
+
+
+# Testing
+
+Feature levelling is a very wide area, and used all over the hypervisor.
+Please ask on xen-devel for help identifying more specific tests which could
+be of use.
+
+
+# Known issues / Areas for improvement
+
+The feature querying and levelling functions should exposed in a
+convenient-to-use way by `xl`.
+
+Xen currently has no concept of per-{socket,core,thread} CPUID information.
+As a result, details such as APIC IDs, topology and cache information do not
+match real hardware, and do not match the documented expectations in the Intel
+and AMD system manuals.
+
+The CPU feature flags are the only information which the toolstack has a
+sensible interface for querying and levelling.  Other information in the CPUID
+policy is important and should be levelled (e.g. maxphysaddr).
+
+The CPUID policy is currently regenerated from scratch by the receiving side,
+once memory and vcpu content has been restored.  This means that the receiving
+Xen cannot verify the memory/vcpu content against the CPUID policy, and can
+end up running a guest which will subsequently crash.  The CPUID policy should
+be at the head of the migration stream.
+
+MSRs are another source of features for guests.  There is no general provision
+for controlling the available MSRs.  E.g. 64bit versions of Windows notice
+changes in IA32\_MISC\_ENABLE, and suffer a BSOD 0x109 (Critical Structure
+Corruption)
+
+
+# References
+
+[Intel 
Flexmigration](http://www.intel.co.uk/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf)
+
+[AMD Extended Migration 
Technology](http://developer.amd.com/wordpress/media/2012/10/43781-3.00-PUB_Live-Virtual-Machine-Migration-on-AMD-processors.pdf)
+
+
+# History
+
+------------------------------------------------------------------------
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2016-05-31 1        Xen 4.7  Document written
+---------- -------- -------- -------------------------------------------
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.