# HG changeset patch # User Robb Romans <3r@xxxxxxxxxx> # Node ID 1e255eacf158c51d5d4efdd7b17c4d1ce2a6be62 # Parent f619a10fdb762bfc9f061622e6aea1bd6c5e5fb3 Separate hypercalls information into separate file. Depends on 6817-fix-makefile.diff Signed-Off-By: Robb Romans <3r@xxxxxxxxxx> diff -r f619a10fdb76 -r 1e255eacf158 docs/src/interface.tex --- a/docs/src/interface.tex Wed Sep 14 18:08:36 2005 +++ b/docs/src/interface.tex Wed Sep 14 18:22:54 2005 @@ -622,549 +622,10 @@ - \appendix -%\newcommand{\hypercall}[1]{\vspace{5mm}{\large\sf #1}} - - - - - -\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}} - - - - - - -\chapter{Xen Hypercalls} -\label{a:hypercalls} - -Hypercalls represent the procedural interface to Xen; this appendix -categorizes and describes the current set of hypercalls. - -\section{Invoking Hypercalls} - -Hypercalls are invoked in a manner analogous to system calls in a -conventional operating system; a software interrupt is issued which -vectors to an entry point within Xen. On x86\_32 machines the -instruction required is {\tt int \$82}; the (real) IDT is setup so -that this may only be issued from within ring 1. The particular -hypercall to be invoked is contained in {\tt EAX} --- a list -mapping these values to symbolic hypercall names can be found -in {\tt xen/include/public/xen.h}. - -On some occasions a set of hypercalls will be required to carry -out a higher-level function; a good example is when a guest -operating wishes to context switch to a new process which -requires updating various privileged CPU state. As an optimization -for these cases, there is a generic mechanism to issue a set of -hypercalls as a batch: - -\begin{quote} -\hypercall{multicall(void *call\_list, int nr\_calls)} - -Execute a series of hypervisor calls; {\tt nr\_calls} is the length of -the array of {\tt multicall\_entry\_t} structures pointed to be {\tt -call\_list}. Each entry contains the hypercall operation code followed -by up to 7 word-sized arguments. -\end{quote} - -Note that multicalls are provided purely as an optimization; there is -no requirement to use them when first porting a guest operating -system. - - -\section{Virtual CPU Setup} - -At start of day, a guest operating system needs to setup the virtual -CPU it is executing on. This includes installing vectors for the -virtual IDT so that the guest OS can handle interrupts, page faults, -etc. However the very first thing a guest OS must setup is a pair -of hypervisor callbacks: these are the entry points which Xen will -use when it wishes to notify the guest OS of an occurrence. - -\begin{quote} -\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long - event\_address, unsigned long failsafe\_selector, unsigned long - failsafe\_address) } - -Register the normal (``event'') and failsafe callbacks for -event processing. In each case the code segment selector and -address within that segment are provided. The selectors must -have RPL 1; in XenLinux we simply use the kernel's CS for both -{\tt event\_selector} and {\tt failsafe\_selector}. - -The value {\tt event\_address} specifies the address of the guest OSes -event handling and dispatch routine; the {\tt failsafe\_address} -specifies a separate entry point which is used only if a fault occurs -when Xen attempts to use the normal callback. -\end{quote} - - -After installing the hypervisor callbacks, the guest OS can -install a `virtual IDT' by using the following hypercall: - -\begin{quote} -\hypercall{set\_trap\_table(trap\_info\_t *table)} - -Install one or more entries into the per-domain -trap handler table (essentially a software version of the IDT). -Each entry in the array pointed to by {\tt table} includes the -exception vector number with the corresponding segment selector -and entry point. Most guest OSes can use the same handlers on -Xen as when running on the real hardware; an exception is the -page fault handler (exception vector 14) where a modified -stack-frame layout is used. - - -\end{quote} - - - -\section{Scheduling and Timer} - -Domains are preemptively scheduled by Xen according to the -parameters installed by domain 0 (see Section~\ref{s:dom0ops}). -In addition, however, a domain may choose to explicitly -control certain behavior with the following hypercall: - -\begin{quote} -\hypercall{sched\_op(unsigned long op)} - -Request scheduling operation from hypervisor. The options are: {\it -yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the -calling domain runnable but may cause a reschedule if other domains -are runnable. {\it block} removes the calling domain from the run -queue and cause is to sleeps until an event is delivered to it. {\it -shutdown} is used to end the domain's execution; the caller can -additionally specify whether the domain should reboot, halt or -suspend. -\end{quote} - -To aid the implementation of a process scheduler within a guest OS, -Xen provides a virtual programmable timer: - -\begin{quote} -\hypercall{set\_timer\_op(uint64\_t timeout)} - -Request a timer event to be sent at the specified system time (time -in nanoseconds since system boot). The hypercall actually passes the -64-bit timeout value as a pair of 32-bit values. - -\end{quote} - -Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} -allows block-with-timeout semantics. - - -\section{Page Table Management} - -Since guest operating systems have read-only access to their page -tables, Xen must be involved when making any changes. The following -multi-purpose hypercall can be used to modify page-table entries, -update the machine-to-physical mapping table, flush the TLB, install -a new page-table base pointer, and more. - -\begin{quote} -\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} - -Update the page table for the domain; a set of {\tt count} updates are -submitted for processing in a batch, with {\tt success\_count} being -updated to report the number of successful updates. - -Each element of {\tt req[]} contains a pointer (address) and value; -the least significant 2-bits of the pointer are used to distinguish -the type of update requested as follows: -\begin{description} - -\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or -page table entry to the associated value; Xen will check that the -update is safe, as described in Chapter~\ref{c:memory}. - -\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the - machine-to-physical table. The calling domain must own the machine - page in question (or be privileged). - -\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations. -The set of additional MMU operations is considerable, and includes -updating {\tt cr3} (or just re-installing it for a TLB flush), -flushing the cache, installing a new LDT, or pinning \& unpinning -page-table pages (to ensure their reference count doesn't drop to zero -which would require a revalidation of all entries). - -Further extended commands are used to deal with granting and -acquiring page ownership; see Section~\ref{s:idc}. - - -\end{description} - -More details on the precise format of all commands can be -found in {\tt xen/include/public/xen.h}. - - -\end{quote} - -Explicitly updating batches of page table entries is extremely -efficient, but can require a number of alterations to the guest -OS. Using the writable page table mode (Chapter~\ref{c:memory}) is -recommended for new OS ports. - -Regardless of which page table update mode is being used, however, -there are some occasions (notably handling a demand page fault) where -a guest OS will wish to modify exactly one PTE rather than a -batch. This is catered for by the following: - -\begin{quote} -\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long -val, \\ unsigned long flags)} - -Update the currently installed PTE for the page {\tt page\_nr} to -{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification -is safe before applying it. The {\tt flags} determine which kind -of TLB flush, if any, should follow the update. - -\end{quote} - -Finally, sufficiently privileged domains may occasionally wish to manipulate -the pages of others: -\begin{quote} - -\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr, -unsigned long val, unsigned long flags, uint16\_t domid)} - -Identical to {\tt update\_va\_mapping()} save that the pages being -mapped must belong to the domain {\tt domid}. - -\end{quote} - -This privileged operation is currently used by backend virtual device -drivers to safely map pages containing I/O data. - - - -\section{Segmentation Support} - -Xen allows guest OSes to install a custom GDT if they require it; -this is context switched transparently whenever a domain is -[de]scheduled. The following hypercall is effectively a -`safe' version of {\tt lgdt}: - -\begin{quote} -\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} - -Install a global descriptor table for a domain; {\tt frame\_list} is -an array of up to 16 machine page frames within which the GDT resides, -with {\tt entries} being the actual number of descriptor-entry -slots. All page frames must be mapped read-only within the guest's -address space, and the table must be large enough to contain Xen's -reserved entries (see {\tt xen/include/public/arch-x86\_32.h}). - -\end{quote} - -Many guest OSes will also wish to install LDTs; this is achieved by -using {\tt mmu\_update()} with an extended command, passing the -linear address of the LDT base along with the number of entries. No -special safety checks are required; Xen needs to perform this task -simply since {\tt lldt} requires CPL 0. - - -Xen also allows guest operating systems to update just an -individual segment descriptor in the GDT or LDT: - -\begin{quote} -\hypercall{update\_descriptor(unsigned long ma, unsigned long word1, -unsigned long word2)} - -Update the GDT/LDT entry at machine address {\tt ma}; the new -8-byte descriptor is stored in {\tt word1} and {\tt word2}. -Xen performs a number of checks to ensure the descriptor is -valid. - -\end{quote} - -Guest OSes can use the above in place of context switching entire -LDTs (or the GDT) when the number of changing descriptors is small. - -\section{Context Switching} - -When a guest OS wishes to context switch between two processes, -it can use the page table and segmentation hypercalls described -above to perform the the bulk of the privileged work. In addition, -however, it will need to invoke Xen to switch the kernel (ring 1) -stack pointer: - -\begin{quote} -\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} - -Request kernel stack switch from hypervisor; {\tt ss} is the new -stack segment, which {\tt esp} is the new stack pointer. - -\end{quote} - -A final useful hypercall for context switching allows ``lazy'' -save and restore of floating point state: - -\begin{quote} -\hypercall{fpu\_taskswitch(void)} - -This call instructs Xen to set the {\tt TS} bit in the {\tt cr0} -control register; this means that the next attempt to use floating -point will cause a trap which the guest OS can trap. Typically it will -then save/restore the FP state, and clear the {\tt TS} bit. -\end{quote} - -This is provided as an optimization only; guest OSes can also choose -to save and restore FP state on all context switches for simplicity. - - -\section{Physical Memory Management} - -As mentioned previously, each domain has a maximum and current -memory allocation. The maximum allocation, set at domain creation -time, cannot be modified. However a domain can choose to reduce -and subsequently grow its current allocation by using the -following call: - -\begin{quote} -\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list, - unsigned long nr\_extents, unsigned int extent\_order)} - -Increase or decrease current memory allocation (as determined by -the value of {\tt op}). Each invocation provides a list of -extents each of which is $2^s$ pages in size, -where $s$ is the value of {\tt extent\_order}. - -\end{quote} - -In addition to simply reducing or increasing the current memory -allocation via a `balloon driver', this call is also useful for -obtaining contiguous regions of machine memory when required (e.g. -for certain PCI devices, or if using superpages). - - -\section{Inter-Domain Communication} -\label{s:idc} - -Xen provides a simple asynchronous notification mechanism via -\emph{event channels}. Each domain has a set of end-points (or -\emph{ports}) which may be bound to an event source (e.g. a physical -IRQ, a virtual IRQ, or an port in another domain). When a pair of -end-points in two different domains are bound together, then a `send' -operation on one will cause an event to be received by the destination -domain. - -The control and use of event channels involves the following hypercall: - -\begin{quote} -\hypercall{event\_channel\_op(evtchn\_op\_t *op)} - -Inter-domain event-channel management; {\tt op} is a discriminated -union which allows the following 7 operations: - -\begin{description} - -\item[\it alloc\_unbound:] allocate a free (unbound) local - port and prepare for connection from a specified domain. -\item[\it bind\_virq:] bind a local port to a virtual -IRQ; any particular VIRQ can be bound to at most one port per domain. -\item[\it bind\_pirq:] bind a local port to a physical IRQ; -once more, a given pIRQ can be bound to at most one port per -domain. Furthermore the calling domain must be sufficiently -privileged. -\item[\it bind\_interdomain:] construct an interdomain event -channel; in general, the target domain must have previously allocated -an unbound port for this channel, although this can be bypassed by -privileged domains during domain setup. -\item[\it close:] close an interdomain event channel. -\item[\it send:] send an event to the remote end of a -interdomain event channel. -\item[\it status:] determine the current status of a local port. -\end{description} - -For more details see -{\tt xen/include/public/event\_channel.h}. - -\end{quote} - -Event channels are the fundamental communication primitive between -Xen domains and seamlessly support SMP. However they provide little -bandwidth for communication {\sl per se}, and hence are typically -married with a piece of shared memory to produce effective and -high-performance inter-domain communication. - -Safe sharing of memory pages between guest OSes is carried out by -granting access on a per page basis to individual domains. This is -achieved by using the {\tt grant\_table\_op()} hypercall. - -\begin{quote} -\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)} - -Grant or remove access to a particular page to a particular domain. - -\end{quote} - -This is not currently widely in use by guest operating systems, but -we intend to integrate support more fully in the near future. - -\section{PCI Configuration} - -Domains with physical device access (i.e.\ driver domains) receive -limited access to certain PCI devices (bus address space and -interrupts). However many guest operating systems attempt to -determine the PCI configuration by directly access the PCI BIOS, -which cannot be allowed for safety. - -Instead, Xen provides the following hypercall: - -\begin{quote} -\hypercall{physdev\_op(void *physdev\_op)} - -Perform a PCI configuration option; depending on the value -of {\tt physdev\_op} this can be a PCI config read, a PCI config -write, or a small number of other queries. - -\end{quote} - - -For examples of using {\tt physdev\_op()}, see the -Xen-specific PCI code in the linux sparse tree. - -\section{Administrative Operations} -\label{s:dom0ops} - -A large number of control operations are available to a sufficiently -privileged domain (typically domain 0). These allow the creation and -management of new domains, for example. A complete list is given -below: for more details on any or all of these, please see -{\tt xen/include/public/dom0\_ops.h} - - -\begin{quote} -\hypercall{dom0\_op(dom0\_op\_t *op)} - -Administrative domain operations for domain management. The options are: - -\begin{description} -\item [\it DOM0\_CREATEDOMAIN:] create a new domain - -\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run -queue. - -\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable - once again. - -\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated -with a domain - -\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain - -\item [\it DOM0\_SCHEDCTL:] - -\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain - -\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain - -\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain - -\item [\it DOM0\_GETPAGEFRAMEINFO:] - -\item [\it DOM0\_GETPAGEFRAMEINFO2:] - -\item [\it DOM0\_IOPL:] set I/O privilege level - -\item [\it DOM0\_MSR:] read or write model specific registers - -\item [\it DOM0\_DEBUG:] interactively invoke the debugger - -\item [\it DOM0\_SETTIME:] set system time - -\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring - -\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU - -\item [\it DOM0\_GETTBUFS:] get information about the size and location of - the trace buffers (only on trace-buffer enabled builds) - -\item [\it DOM0\_PHYSINFO:] get information about the host machine - -\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions - -\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler - -\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes - -\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain - -\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain - -\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options -\end{description} -\end{quote} - -Most of the above are best understood by looking at the code -implementing them (in {\tt xen/common/dom0\_ops.c}) and in -the user-space tools that use them (mostly in {\tt tools/libxc}). - -\section{Debugging Hypercalls} - -A few additional hypercalls are mainly useful for debugging: - -\begin{quote} -\hypercall{console\_io(int cmd, int count, char *str)} - -Use Xen to interact with the console; operations are: - -{\it CONSOLEIO\_write}: Output count characters from buffer str. - -{\it CONSOLEIO\_read}: Input at most count characters into buffer str. -\end{quote} - -A pair of hypercalls allows access to the underlying debug registers: -\begin{quote} -\hypercall{set\_debugreg(int reg, unsigned long value)} - -Set debug register {\tt reg} to {\tt value} - -\hypercall{get\_debugreg(int reg)} - -Return the contents of the debug register {\tt reg} -\end{quote} - -And finally: -\begin{quote} -\hypercall{xen\_version(int cmd)} - -Request Xen version number. -\end{quote} - -This is useful to ensure that user-space tools are in sync -with the underlying hypervisor. - -\section{Deprecated Hypercalls} - -Xen is under constant development and refinement; as such there -are plans to improve the way in which various pieces of functionality -are exposed to guest OSes. - -\begin{quote} -\hypercall{vm\_assist(unsigned int cmd, unsigned int type)} - -Toggle various memory management modes (in particular wrritable page -tables and superpage support). - -\end{quote} - -This is likely to be replaced with mode values in the shared -information page since this is more resilient for resumption -after migration or checkpoint. - - - - - - +\include{src/interface/hypercalls} +%% hypercalls moved to hypercalls.tex %% diff -r f619a10fdb76 -r 1e255eacf158 docs/src/interface/hypercalls.tex --- /dev/null Wed Sep 14 18:08:36 2005 +++ b/docs/src/interface/hypercalls.tex Wed Sep 14 18:22:54 2005 @@ -0,0 +1,524 @@ + +\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}} + +\chapter{Xen Hypercalls} +\label{a:hypercalls} + +Hypercalls represent the procedural interface to Xen; this appendix +categorizes and describes the current set of hypercalls. + +\section{Invoking Hypercalls} + +Hypercalls are invoked in a manner analogous to system calls in a +conventional operating system; a software interrupt is issued which +vectors to an entry point within Xen. On x86\_32 machines the +instruction required is {\tt int \$82}; the (real) IDT is setup so +that this may only be issued from within ring 1. The particular +hypercall to be invoked is contained in {\tt EAX} --- a list +mapping these values to symbolic hypercall names can be found +in {\tt xen/include/public/xen.h}. + +On some occasions a set of hypercalls will be required to carry +out a higher-level function; a good example is when a guest +operating wishes to context switch to a new process which +requires updating various privileged CPU state. As an optimization +for these cases, there is a generic mechanism to issue a set of +hypercalls as a batch: + +\begin{quote} +\hypercall{multicall(void *call\_list, int nr\_calls)} + +Execute a series of hypervisor calls; {\tt nr\_calls} is the length of +the array of {\tt multicall\_entry\_t} structures pointed to be {\tt +call\_list}. Each entry contains the hypercall operation code followed +by up to 7 word-sized arguments. +\end{quote} + +Note that multicalls are provided purely as an optimization; there is +no requirement to use them when first porting a guest operating +system. + + +\section{Virtual CPU Setup} + +At start of day, a guest operating system needs to setup the virtual +CPU it is executing on. This includes installing vectors for the +virtual IDT so that the guest OS can handle interrupts, page faults, +etc. However the very first thing a guest OS must setup is a pair +of hypervisor callbacks: these are the entry points which Xen will +use when it wishes to notify the guest OS of an occurrence. + +\begin{quote} +\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long + event\_address, unsigned long failsafe\_selector, unsigned long + failsafe\_address) } + +Register the normal (``event'') and failsafe callbacks for +event processing. In each case the code segment selector and +address within that segment are provided. The selectors must +have RPL 1; in XenLinux we simply use the kernel's CS for both +{\tt event\_selector} and {\tt failsafe\_selector}. + +The value {\tt event\_address} specifies the address of the guest OSes +event handling and dispatch routine; the {\tt failsafe\_address} +specifies a separate entry point which is used only if a fault occurs +when Xen attempts to use the normal callback. +\end{quote} + + +After installing the hypervisor callbacks, the guest OS can +install a `virtual IDT' by using the following hypercall: + +\begin{quote} +\hypercall{set\_trap\_table(trap\_info\_t *table)} + +Install one or more entries into the per-domain +trap handler table (essentially a software version of the IDT). +Each entry in the array pointed to by {\tt table} includes the +exception vector number with the corresponding segment selector +and entry point. Most guest OSes can use the same handlers on +Xen as when running on the real hardware; an exception is the +page fault handler (exception vector 14) where a modified +stack-frame layout is used. + + +\end{quote} + + + +\section{Scheduling and Timer} + +Domains are preemptively scheduled by Xen according to the +parameters installed by domain 0 (see Section~\ref{s:dom0ops}). +In addition, however, a domain may choose to explicitly +control certain behavior with the following hypercall: + +\begin{quote} +\hypercall{sched\_op(unsigned long op)} + +Request scheduling operation from hypervisor. The options are: {\it +yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the +calling domain runnable but may cause a reschedule if other domains +are runnable. {\it block} removes the calling domain from the run +queue and cause is to sleeps until an event is delivered to it. {\it +shutdown} is used to end the domain's execution; the caller can +additionally specify whether the domain should reboot, halt or +suspend. +\end{quote} + +To aid the implementation of a process scheduler within a guest OS, +Xen provides a virtual programmable timer: + +\begin{quote} +\hypercall{set\_timer\_op(uint64\_t timeout)} + +Request a timer event to be sent at the specified system time (time +in nanoseconds since system boot). The hypercall actually passes the +64-bit timeout value as a pair of 32-bit values. + +\end{quote} + +Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} +allows block-with-timeout semantics. + + +\section{Page Table Management} + +Since guest operating systems have read-only access to their page +tables, Xen must be involved when making any changes. The following +multi-purpose hypercall can be used to modify page-table entries, +update the machine-to-physical mapping table, flush the TLB, install +a new page-table base pointer, and more. + +\begin{quote} +\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} + +Update the page table for the domain; a set of {\tt count} updates are +submitted for processing in a batch, with {\tt success\_count} being +updated to report the number of successful updates. + +Each element of {\tt req[]} contains a pointer (address) and value; +the least significant 2-bits of the pointer are used to distinguish +the type of update requested as follows: +\begin{description} + +\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or +page table entry to the associated value; Xen will check that the +update is safe, as described in Chapter~\ref{c:memory}. + +\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the + machine-to-physical table. The calling domain must own the machine + page in question (or be privileged). + +\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations. +The set of additional MMU operations is considerable, and includes +updating {\tt cr3} (or just re-installing it for a TLB flush), +flushing the cache, installing a new LDT, or pinning \& unpinning +page-table pages (to ensure their reference count doesn't drop to zero +which would require a revalidation of all entries). + +Further extended commands are used to deal with granting and +acquiring page ownership; see Section~\ref{s:idc}. + + +\end{description} + +More details on the precise format of all commands can be +found in {\tt xen/include/public/xen.h}. + + +\end{quote} + +Explicitly updating batches of page table entries is extremely +efficient, but can require a number of alterations to the guest +OS. Using the writable page table mode (Chapter~\ref{c:memory}) is +recommended for new OS ports. + +Regardless of which page table update mode is being used, however, +there are some occasions (notably handling a demand page fault) where +a guest OS will wish to modify exactly one PTE rather than a +batch. This is catered for by the following: + +\begin{quote} +\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long +val, \\ unsigned long flags)} + +Update the currently installed PTE for the page {\tt page\_nr} to +{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification +is safe before applying it. The {\tt flags} determine which kind +of TLB flush, if any, should follow the update. + +\end{quote} + +Finally, sufficiently privileged domains may occasionally wish to manipulate +the pages of others: +\begin{quote} + +\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr, +unsigned long val, unsigned long flags, uint16\_t domid)} + +Identical to {\tt update\_va\_mapping()} save that the pages being +mapped must belong to the domain {\tt domid}. + +\end{quote} + +This privileged operation is currently used by backend virtual device +drivers to safely map pages containing I/O data. + + + +\section{Segmentation Support} + +Xen allows guest OSes to install a custom GDT if they require it; +this is context switched transparently whenever a domain is +[de]scheduled. The following hypercall is effectively a +`safe' version of {\tt lgdt}: + +\begin{quote} +\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} + +Install a global descriptor table for a domain; {\tt frame\_list} is +an array of up to 16 machine page frames within which the GDT resides, +with {\tt entries} being the actual number of descriptor-entry +slots. All page frames must be mapped read-only within the guest's +address space, and the table must be large enough to contain Xen's +reserved entries (see {\tt xen/include/public/arch-x86\_32.h}). + +\end{quote} + +Many guest OSes will also wish to install LDTs; this is achieved by +using {\tt mmu\_update()} with an extended command, passing the +linear address of the LDT base along with the number of entries. No +special safety checks are required; Xen needs to perform this task +simply since {\tt lldt} requires CPL 0. + + +Xen also allows guest operating systems to update just an +individual segment descriptor in the GDT or LDT: + +\begin{quote} +\hypercall{update\_descriptor(unsigned long ma, unsigned long word1, +unsigned long word2)} + +Update the GDT/LDT entry at machine address {\tt ma}; the new +8-byte descriptor is stored in {\tt word1} and {\tt word2}. +Xen performs a number of checks to ensure the descriptor is +valid. + +\end{quote} + +Guest OSes can use the above in place of context switching entire +LDTs (or the GDT) when the number of changing descriptors is small. + +\section{Context Switching} + +When a guest OS wishes to context switch between two processes, +it can use the page table and segmentation hypercalls described +above to perform the the bulk of the privileged work. In addition, +however, it will need to invoke Xen to switch the kernel (ring 1) +stack pointer: + +\begin{quote} +\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} + +Request kernel stack switch from hypervisor; {\tt ss} is the new +stack segment, which {\tt esp} is the new stack pointer. + +\end{quote} + +A final useful hypercall for context switching allows ``lazy'' +save and restore of floating point state: + +\begin{quote} +\hypercall{fpu\_taskswitch(void)} + +This call instructs Xen to set the {\tt TS} bit in the {\tt cr0} +control register; this means that the next attempt to use floating +point will cause a trap which the guest OS can trap. Typically it will +then save/restore the FP state, and clear the {\tt TS} bit. +\end{quote} + +This is provided as an optimization only; guest OSes can also choose +to save and restore FP state on all context switches for simplicity. + + +\section{Physical Memory Management} + +As mentioned previously, each domain has a maximum and current +memory allocation. The maximum allocation, set at domain creation +time, cannot be modified. However a domain can choose to reduce +and subsequently grow its current allocation by using the +following call: + +\begin{quote} +\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list, + unsigned long nr\_extents, unsigned int extent\_order)} + +Increase or decrease current memory allocation (as determined by +the value of {\tt op}). Each invocation provides a list of +extents each of which is $2^s$ pages in size, +where $s$ is the value of {\tt extent\_order}. + +\end{quote} + +In addition to simply reducing or increasing the current memory +allocation via a `balloon driver', this call is also useful for +obtaining contiguous regions of machine memory when required (e.g. +for certain PCI devices, or if using superpages). + + +\section{Inter-Domain Communication} +\label{s:idc} + +Xen provides a simple asynchronous notification mechanism via +\emph{event channels}. Each domain has a set of end-points (or +\emph{ports}) which may be bound to an event source (e.g. a physical +IRQ, a virtual IRQ, or an port in another domain). When a pair of +end-points in two different domains are bound together, then a `send' +operation on one will cause an event to be received by the destination +domain. + +The control and use of event channels involves the following hypercall: + +\begin{quote} +\hypercall{event\_channel\_op(evtchn\_op\_t *op)} + +Inter-domain event-channel management; {\tt op} is a discriminated +union which allows the following 7 operations: + +\begin{description} + +\item[\it alloc\_unbound:] allocate a free (unbound) local + port and prepare for connection from a specified domain. +\item[\it bind\_virq:] bind a local port to a virtual +IRQ; any particular VIRQ can be bound to at most one port per domain. +\item[\it bind\_pirq:] bind a local port to a physical IRQ; +once more, a given pIRQ can be bound to at most one port per +domain. Furthermore the calling domain must be sufficiently +privileged. +\item[\it bind\_interdomain:] construct an interdomain event +channel; in general, the target domain must have previously allocated +an unbound port for this channel, although this can be bypassed by +privileged domains during domain setup. +\item[\it close:] close an interdomain event channel. +\item[\it send:] send an event to the remote end of a +interdomain event channel. +\item[\it status:] determine the current status of a local port. +\end{description} + +For more details see +{\tt xen/include/public/event\_channel.h}. + +\end{quote} + +Event channels are the fundamental communication primitive between +Xen domains and seamlessly support SMP. However they provide little +bandwidth for communication {\sl per se}, and hence are typically +married with a piece of shared memory to produce effective and +high-performance inter-domain communication. + +Safe sharing of memory pages between guest OSes is carried out by +granting access on a per page basis to individual domains. This is +achieved by using the {\tt grant\_table\_op()} hypercall. + +\begin{quote} +\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)} + +Grant or remove access to a particular page to a particular domain. + +\end{quote} + +This is not currently widely in use by guest operating systems, but +we intend to integrate support more fully in the near future. + +\section{PCI Configuration} + +Domains with physical device access (i.e.\ driver domains) receive +limited access to certain PCI devices (bus address space and +interrupts). However many guest operating systems attempt to +determine the PCI configuration by directly access the PCI BIOS, +which cannot be allowed for safety. + +Instead, Xen provides the following hypercall: + +\begin{quote} +\hypercall{physdev\_op(void *physdev\_op)} + +Perform a PCI configuration option; depending on the value +of {\tt physdev\_op} this can be a PCI config read, a PCI config +write, or a small number of other queries. + +\end{quote} + + +For examples of using {\tt physdev\_op()}, see the +Xen-specific PCI code in the linux sparse tree. + +\section{Administrative Operations} +\label{s:dom0ops} + +A large number of control operations are available to a sufficiently +privileged domain (typically domain 0). These allow the creation and +management of new domains, for example. A complete list is given +below: for more details on any or all of these, please see +{\tt xen/include/public/dom0\_ops.h} + + +\begin{quote} +\hypercall{dom0\_op(dom0\_op\_t *op)} + +Administrative domain operations for domain management. The options are: + +\begin{description} +\item [\it DOM0\_CREATEDOMAIN:] create a new domain + +\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run +queue. + +\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable + once again. + +\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated +with a domain + +\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain + +\item [\it DOM0\_SCHEDCTL:] + +\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain + +\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain + +\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain + +\item [\it DOM0\_GETPAGEFRAMEINFO:] + +\item [\it DOM0\_GETPAGEFRAMEINFO2:] + +\item [\it DOM0\_IOPL:] set I/O privilege level + +\item [\it DOM0\_MSR:] read or write model specific registers + +\item [\it DOM0\_DEBUG:] interactively invoke the debugger + +\item [\it DOM0\_SETTIME:] set system time + +\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring + +\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU + +\item [\it DOM0\_GETTBUFS:] get information about the size and location of + the trace buffers (only on trace-buffer enabled builds) + +\item [\it DOM0\_PHYSINFO:] get information about the host machine + +\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions + +\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler + +\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes + +\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain + +\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain + +\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options +\end{description} +\end{quote} + +Most of the above are best understood by looking at the code +implementing them (in {\tt xen/common/dom0\_ops.c}) and in +the user-space tools that use them (mostly in {\tt tools/libxc}). + +\section{Debugging Hypercalls} + +A few additional hypercalls are mainly useful for debugging: + +\begin{quote} +\hypercall{console\_io(int cmd, int count, char *str)} + +Use Xen to interact with the console; operations are: + +{\it CONSOLEIO\_write}: Output count characters from buffer str. + +{\it CONSOLEIO\_read}: Input at most count characters into buffer str. +\end{quote} + +A pair of hypercalls allows access to the underlying debug registers: +\begin{quote} +\hypercall{set\_debugreg(int reg, unsigned long value)} + +Set debug register {\tt reg} to {\tt value} + +\hypercall{get\_debugreg(int reg)} + +Return the contents of the debug register {\tt reg} +\end{quote} + +And finally: +\begin{quote} +\hypercall{xen\_version(int cmd)} + +Request Xen version number. +\end{quote} + +This is useful to ensure that user-space tools are in sync +with the underlying hypervisor. + +\section{Deprecated Hypercalls} + +Xen is under constant development and refinement; as such there +are plans to improve the way in which various pieces of functionality +are exposed to guest OSes. + +\begin{quote} +\hypercall{vm\_assist(unsigned int cmd, unsigned int type)} + +Toggle various memory management modes (in particular wrritable page +tables and superpage support). + +\end{quote} + +This is likely to be replaced with mode values in the shared +information page since this is more resilient for resumption +after migration or checkpoint.