[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Alternate p2m design specification



This document describes a new capability for VM Introspection, Security and 
Privacy in Xen. The new capability is called âaltp2mâ (short for Alternate p2m) 
that is used to provide the ability for Xen to host alternate guest physical 
memory domains for a specific guest-domain. This document describes the overall 
design specific to Xen for your review and feedback.

Background
=========

Intel VT-x2 CPUs support Extended Page Tables (EPTs). Extended Page Tables 
allow the VMM to restrict permissions for guest physical pages accessed by 
software operating in the guest (VMX-non-root). The p2m capability in Xen 
abstracts the architecture-specific details of EPTs. Typically, Xen manages a 
single p2m for a specific guest domain.

ALTP2M Introduction
================

The altp2m capability enables management of multiple (alternate) p2ms per HVM 
guest domain thus allowing for separate physical memory domains per guest. The 
altp2m capability allows for para-virtualized guest software agent within or 
across domains to be able to enforce memory introspection policies in an 
efficient manner. Altp2m also allows para-virtualized guest agent components to 
be isolated within an HVM (in terms of guest physical memory) for secure VM 
introspection as well as various other security and privacy usages that require 
efficient memory isolation. 

Two related Intel CPU features are utilized as performance enhancement 
capabilities within the altp2m module when operating with an in-domain agent. 
The altp2m module opportunistically uses these assists when enumerated on the 
CPU. Operations that require frequent switching between p2m domains can incur a 
high overhead if done via legacy approaches such as via a hypercall. VM 
Functions (VMFUNC) is a new VT-x instruction on Intel's 4th gen Core (Haswell) 
and Atom (Silvermont) CPUs. In general, VMFUNC is targeted to reduce overhead 
of services provided by the CPU to an HVM guest (once configured by the VMM) â 
one such leaf (0) is defined is to provide a low latency p2m switching (EPTP 
Switching in Intel terminology) capability. VMFUNC leaf 0 is enabled as part of 
the altp2m functionality to allow para-virtualized agents in an HVM to apply 
custom p2m domain switching policies without incurring overheads due to VM 
Exits.

#VE (Virtualization Exception) is a feature introduced on Intelâs 5th gen Core 
(Broadwell) and Atom (Goldmont) CPUs. #VE is a CPU assist defined to allow the 
VMM to convert EPT violations for specific guest physical page accesses to a 
guest-IDT-delivered exception (new vector 20), and thus reduce the latency for 
managing VM introspection policies for guest memory read, write and/or execute 
attempts â these are induced events configured by a para-virtualized security 
agent monitoring guest memory accesses based on its isolation/monitoring 
policies. In legacy (pre-#VE) CPUs, EPT violations require a VM Exit and 
frequent induced EPT violations can add high hypervisor overhead. #VE reduces 
the impact of this overhead, whilst reducing the amount of guest-specific 
policy context to be inserted into the VMM.

Both VMFUNC and #VE are designed such that a VMM can emulate them on legacy 
CPUs. The altp2m module includes full emulation of VMFUNC leaf 0 and #VE, so 
in-domain agents can be written to assume both capabilities are available on 
all hardware.

VMFUNC Introduction
=================

VMFUNC leaf 0 for EPTP-Switching is a hardware-assisted efficient way to switch 
EPTs configured by the VMM. Software in a Xen guest domain may invoke a VM 
function with the VMFUNC instruction; the value of EAX selects the specific VM 
function being invoked.

The VMM enables VM functions generally by setting the âenable VM functionsâ 
VM-execution control. A specific VM function is enabled by setting the 
corresponding VM-function control. When software wants to enable EPTP switching 
(VM function 0) it must set the âactivate secondary controlsâ VM-execution 
control (bit 31 of the primary processor-based VM-execution controls), the 
âenable VM functionsâ VM-execution control (bit 13 of the secondary 
processor-based VMexecution controls) and the âEPTP switchingâ VM-function 
control (bit 0 of the VM-function controls).

The VMFUNC instruction causes an invalid-opcode exception (#UD) if the âenable 
VM functionsâ VM-execution controls is 0 or the value of EAX is greater than 63 
(only VM functions 0â63 can be enabled). Otherwise, the instruction causes a VM 
exit if the bit at position EAX is 0 in the VM-function controls (the selected 
VM Function is not enabled). If such a VM exit occurs, the basic exit reason 
used is 59 (3BH), indicating âVMFUNCâ, and the length of the VMFUNC instruction 
is saved into the VM-exit instruction-length field. If the instruction causes 
neither an invalid-opcode exception nor a VM exit due to a disabled VM 
function, it performs the functionality of the VM function specified by the 
value in EAX. 

VMFUNC leaf 0/EPTP switching allows guest software to load a new value for the 
EPT pointer (EPTP), thereby establishing a different EPT paging-structure 
hierarchy. Guest software is limited to selecting from a list of potential EPTP 
values configured in advance by the VMM. Specifically, the value of ECX is used 
to select an entry from an EPTP list, a 4-KByte structure referenced by the 
EPTP-list address (a new control field in the VMCS). VMFUNC causes a VM exit 
for error conditions such as if ECX â 512. If the selected entry is a valid 
EPTP value (i.e. the EPTP would not cause VM entry to fail), it is stored in 
the EPTP field of the current VMCS and is used for subsequent accesses using 
guest-physical addresses.

The complete spec of VMFUNC can be found in chapter 25.5.5 of the Intel SDM at:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

 

#VE Introduction
=============

A virtualization exception is a new processor exception. It uses vector 20 and 
is abbreviated #VE. A virtualization exception can occur only in VMX non-root 
operation. The 1-setting of the âEPT-violation #VEâ VM-execution control causes 
some EPT violations to generate virtualization exceptions instead of VM exits. 
The VMM manages how the processor determines whether an EPT violation causes a 
virtualization exception or a VM exit. When the processor encounters a 
virtualization exception, it saves information about the exception to the 
virtualization-exception information area (hosted in a 4Kb page referenced by a 
new field in the VMCS). After saving virtualization-exception information, the 
processor delivers a virtualization exception as it would any other exception.

The values of certain EPT paging-structure entries determine which EPT 
violations are convertible. Specifically, bit 63 of certain EPT 
paging-structure entries is defined to suppress #VE â effectively, an EPT 
violation is convertible to #VE if and only if bit 63 of the EPT entry that 
caused the EPT violation is 0. Note that EPT misconfiguration behavior does not 
change and always cause VM exits.

The complete spec of #VE can be found in chapters 25.5.6 of the Intel SDM at:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
 
With VMFUNC and #VE, the Xen hypervisor does not have to be involved for 
handling guest VM-introspection policies, which reduces hypervisor overhead, 
complexity (TCB), and would work well with VM migration.  For a guest domain 
using VMFUNC and #VE, more CPU cycles can be allocated to guest, so benchmarks 
in guest domain using VM-introspection with VMFUNC and #VE enabled will have 
better performance comparing to non-VMFUNC/#VE CPU.

Design
======

- Altp2m feature enabled via opt-in parameter

A new Xen boot parameter, 'altp2m', is introduced to control altp2m on a global 
basis â this parameter defaults to 0 (disabled).

- Altp2m enable/disable for particular domain

Additionally, a new domain parameter, 'altp2mhvm', is introduced to control 
altp2m for an individual HVM domain â this parameter also defaults to 0 
(disabled).

Both parameters must be set to 1 (enabled) before altp2m functionality is 
available in a given domain.

At any point in time, altp2m is enabled for all vcpus of a domain or disabled 
for all vcpus of that domain. Alternate EPT tables created for the alternate 
p2m are shared by all vcpus assigned to a domain. Altp2m mode may be 
dynamically enabled/disabled for a domain. 

- Hypercalls for altp2m

Altp2m mode introduces a new set of hypercalls for altp2m management from 
software agents operating in Xen HVM guests.

The hypercalls are as follows:
Enable or Disable altp2m mode for domain
Create a new alternate p2m 
Edit permissions for a specific GPA within an alternate p2m 
Destroy an existing alternate p2m

- Core altp2m functionality

A new altp2m type is added to the p2m types (in addition to the previous 
hostp2m and nestedp2m types). An HVM domain can be started in hostp2m mode and 
switched over into altp2m mode via a hypercall. Once a HVM domain is in altp2m 
mode, a set of (currently set size is 10) altp2m objects is managed by Xen. 
Altp2m updates are performed in a lazy manner â in effect, the altp2m reflects 
the same EPT attributes for mappings accessed as the hostp2m unless the 
permissions for a GPA are modified by the guest agent (for a specific altp2m) â 
currently, page permissions and mappings for memory type ram_rw only can be 
modified via the altp2m hypercall. By default, all GPA mappings are set to 
suppress #VE (resulting in legacy behavior for Xen); #VE is un-suppressed for a 
GPA when the in-domain guest agent invokes an altp2m hypercall to modify the 
permission of a GPA. A subsequent guest access to the GPA that violates the 
agent-specified EPT permissions will cause a #VE (instead of an EPT viola
tion) that is expected to be handled by the guest software. One of the valid 
responses to a #VE event in the guest, is to switch altp2m's to activate a 
different set of GPA permissions and mappings. Using VMFUNC, this switch can be 
achieved efficiently for the single vcpu on which the permissions violation 
occurs. There is also a hypercall to switch altp2m's for every vcpu in a 
domain, as is typically required during agent initialisation.

The list of altp2m's is protected by a separate list lock, which must be held 
during any operations which could change the state of an altp2m from valid to 
invalid or vice-versa, or when performing any modification to an altp2m which 
is not the current p2m for the current vcpu. Many operations that must acquire 
the altp2m list lock occur in code paths where the hostp2m lock has already 
been acquired. To avoid locking order violations, the p2m lock has been split 
into two types: altp2ms have a lock type which is lower in the order than other 
p2m's; and the altp2m list lock is placed between the two.

- VMExit handler for VMFUNC

When altp2m is enabled on a CPU with VMFUNC enumerated, an erroneous VMFUNC may 
cause a VM exit with exit reason âVMFUNCâ. A new exit handler is added for this 
exit reason, which injects #UD into the guest.

- Support for intra-domain and inter-domain VM introspection (and XSM)

The altp2m functionality allows the capability to be used via an agent 
operating in an HVM guest or alternately an agent operating in a separate 
privileged domain. For cross domain operation, an XSM hook is defined such that 
the administrator can define a policy for inter-domain VM introspection.

The way in which permissions violations are reported to an in-domain agent and 
the expected agent response have been described above. Restrictions imposed by 
an out-of-domain agent do not have suppress-#VE removed, so they always result 
in a VM exit. The violation is reported through the existing VM Event 
mechanism, modified to indicate that the event is an altp2m event and include 
the current altp2m index; the response can force a change to a different altp2m 
for the relevant vcpu before VM entry. If an in-domain agent places an altp2m 
restriction and a violation of that restriction occurs on a vcpu that cannot 
receive #VE, that will cause of VM exit that will be treated as if the 
restriction had been imposed by an out-of-domain agent.

---------------- END --------------------------

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.