# HG changeset patch # User Robb Romans <3r@xxxxxxxxxx> # Node ID 9983602e0ca451159f007149470d6614d65e3537 # Parent f03c7bc9fb70f125110af518a4166bc270383097 Separate file for interface/architecture. Signed-off-by: Robb Romans <3r@xxxxxxxxxx> diff -r f03c7bc9fb70 -r 9983602e0ca4 docs/src/interface.tex --- a/docs/src/interface.tex Thu Sep 15 19:22:55 2005 +++ b/docs/src/interface.tex Thu Sep 15 20:25:12 2005 @@ -87,149 +87,8 @@ mechanism and policy within the system. - -\chapter{Virtual Architecture} - -On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It -has full access to the physical memory available in the system and is -responsible for allocating portions of it to the domains. Guest -operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as -they see fit. Segmentation is used to prevent the guest OS from -accessing the portion of the address space that is reserved for -Xen. We expect most guest operating systems will use ring 1 for their -own operation and place applications in ring 3. - -In this chapter we consider the basic virtual architecture provided -by Xen: the basic CPU state, exception and interrupt handling, and -time. Other aspects such as memory and device access are discussed -in later chapters. - -\section{CPU state} - -All privileged state must be handled by Xen. The guest OS has no -direct access to CR3 and is not permitted to update privileged bits in -EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen; -these are analogous to system calls but occur from ring 1 to ring 0. - -A list of all hypercalls is given in Appendix~\ref{a:hypercalls}. - - - -\section{Exceptions} - -A virtual IDT is provided --- a domain can submit a table of trap -handlers to Xen via the {\tt set\_trap\_table()} hypercall. Most trap -handlers are identical to native x86 handlers, although the page-fault -handler is somewhat different. - - -\section{Interrupts and events} - -Interrupts are virtualized by mapping them to \emph{events}, which are -delivered asynchronously to the target domain using a callback -supplied via the {\tt set\_callbacks()} hypercall. A guest OS can map -these events onto its standard interrupt dispatch mechanisms. Xen is -responsible for determining the target domain that will handle each -physical interrupt source. For more details on the binding of event -sources to events, see Chapter~\ref{c:devices}. - - - -\section{Time} - -Guest operating systems need to be aware of the passage of both real -(or wallclock) time and their own `virtual time' (the time for -which they have been executing). Furthermore, Xen has a notion of -time which is used for scheduling. The following notions of -time are provided: - -\begin{description} -\item[Cycle counter time.] - -This provides a fine-grained time reference. The cycle counter time is -used to accurately extrapolate the other time references. On SMP machines -it is currently assumed that the cycle counter time is synchronized between -CPUs. The current x86-based implementation achieves this within inter-CPU -communication latencies. - -\item[System time.] - -This is a 64-bit counter which holds the number of nanoseconds that -have elapsed since system boot. - - -\item[Wall clock time.] - -This is the time of day in a Unix-style {\tt struct timeval} (seconds -and microseconds since 1 January 1970, adjusted by leap seconds). An -NTP client hosted by {\it domain 0} can keep this value accurate. - - -\item[Domain virtual time.] - -This progresses at the same pace as system time, but only while a -domain is executing --- it stops while a domain is de-scheduled. -Therefore the share of the CPU that a domain receives is indicated by -the rate at which its virtual time increases. - -\end{description} - - -Xen exports timestamps for system time and wall-clock time to guest -operating systems through a shared page of memory. Xen also provides -the cycle counter time at the instant the timestamps were calculated, -and the CPU frequency in Hertz. This allows the guest to extrapolate -system and wall-clock times accurately based on the current cycle -counter time. - -Since all time stamps need to be updated and read \emph{atomically} -two version numbers are also stored in the shared info page. The -first is incremented prior to an update, while the second is only -incremented afterwards. Thus a guest can be sure that it read a consistent -state by checking the two version numbers are equal. - -Xen includes a periodic ticker which sends a timer event to the -currently executing domain every 10ms. The Xen scheduler also sends a -timer event whenever a domain is scheduled; this allows the guest OS -to adjust for the time that has passed while it has been inactive. In -addition, Xen allows each domain to request that they receive a timer -event sent at a specified system time by using the {\tt -set\_timer\_op()} hypercall. Guest OSes may use this timer to -implement timeout values when they block. - - - -%% % akw: demoting this to a section -- not sure if there is any point -%% % though, maybe just remove it. - -\section{Xen CPU Scheduling} - -Xen offers a uniform API for CPU schedulers. It is possible to choose -from a number of schedulers at boot and it should be easy to add more. -The BVT, Atropos and Round Robin schedulers are part of the normal -Xen distribution. BVT provides proportional fair shares of the CPU to -the running domains. Atropos can be used to reserve absolute shares -of the CPU for each domain. Round-robin is provided as an example of -Xen's internal scheduler API. - -\paragraph*{Note: SMP host support} -Xen has always supported SMP host systems. Domains are statically assigned to -CPUs, either at creation time or when manually pinning to a particular CPU. -The current schedulers then run locally on each CPU to decide which of the -assigned domains should be run there. The user-level control software -can be used to perform coarse-grain load-balancing between CPUs. - - -%% More information on the characteristics and use of these schedulers is -%% available in {\tt Sched-HOWTO.txt}. - - -\section{Privileged operations} - -Xen exports an extended interface to privileged domains (viz.\ {\it - Domain 0}). This allows such domains to build and boot other domains -on the server, and provides control interfaces for managing -scheduling, memory, networking, and block devices. +%% chapter Virtual Architecture moved to architecture.tex +\include{src/interface/architecture} \chapter{Memory} diff -r f03c7bc9fb70 -r 9983602e0ca4 docs/src/interface/architecture.tex --- /dev/null Thu Sep 15 19:22:55 2005 +++ b/docs/src/interface/architecture.tex Thu Sep 15 20:25:12 2005 @@ -0,0 +1,140 @@ +\chapter{Virtual Architecture} + +On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It +has full access to the physical memory available in the system and is +responsible for allocating portions of it to the domains. Guest +operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as +they see fit. Segmentation is used to prevent the guest OS from +accessing the portion of the address space that is reserved for Xen. +We expect most guest operating systems will use ring 1 for their own +operation and place applications in ring 3. + +In this chapter we consider the basic virtual architecture provided by +Xen: the basic CPU state, exception and interrupt handling, and time. +Other aspects such as memory and device access are discussed in later +chapters. + + +\section{CPU state} + +All privileged state must be handled by Xen. The guest OS has no +direct access to CR3 and is not permitted to update privileged bits in +EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen; +these are analogous to system calls but occur from ring 1 to ring 0. + +A list of all hypercalls is given in Appendix~\ref{a:hypercalls}. + + +\section{Exceptions} + +A virtual IDT is provided --- a domain can submit a table of trap +handlers to Xen via the {\tt set\_trap\_table()} hypercall. Most trap +handlers are identical to native x86 handlers, although the page-fault +handler is somewhat different. + + +\section{Interrupts and events} + +Interrupts are virtualized by mapping them to \emph{events}, which are +delivered asynchronously to the target domain using a callback +supplied via the {\tt set\_callbacks()} hypercall. A guest OS can map +these events onto its standard interrupt dispatch mechanisms. Xen is +responsible for determining the target domain that will handle each +physical interrupt source. For more details on the binding of event +sources to events, see Chapter~\ref{c:devices}. + + +\section{Time} + +Guest operating systems need to be aware of the passage of both real +(or wallclock) time and their own `virtual time' (the time for which +they have been executing). Furthermore, Xen has a notion of time which +is used for scheduling. The following notions of time are provided: + +\begin{description} +\item[Cycle counter time.] + + This provides a fine-grained time reference. The cycle counter time + is used to accurately extrapolate the other time references. On SMP + machines it is currently assumed that the cycle counter time is + synchronized between CPUs. The current x86-based implementation + achieves this within inter-CPU communication latencies. + +\item[System time.] + + This is a 64-bit counter which holds the number of nanoseconds that + have elapsed since system boot. + +\item[Wall clock time.] + + This is the time of day in a Unix-style {\tt struct timeval} + (seconds and microseconds since 1 January 1970, adjusted by leap + seconds). An NTP client hosted by {\it domain 0} can keep this + value accurate. + +\item[Domain virtual time.] + + This progresses at the same pace as system time, but only while a + domain is executing --- it stops while a domain is de-scheduled. + Therefore the share of the CPU that a domain receives is indicated + by the rate at which its virtual time increases. + +\end{description} + + +Xen exports timestamps for system time and wall-clock time to guest +operating systems through a shared page of memory. Xen also provides +the cycle counter time at the instant the timestamps were calculated, +and the CPU frequency in Hertz. This allows the guest to extrapolate +system and wall-clock times accurately based on the current cycle +counter time. + +Since all time stamps need to be updated and read \emph{atomically} +two version numbers are also stored in the shared info page. The first +is incremented prior to an update, while the second is only +incremented afterwards. Thus a guest can be sure that it read a +consistent state by checking the two version numbers are equal. + +Xen includes a periodic ticker which sends a timer event to the +currently executing domain every 10ms. The Xen scheduler also sends a +timer event whenever a domain is scheduled; this allows the guest OS +to adjust for the time that has passed while it has been inactive. In +addition, Xen allows each domain to request that they receive a timer +event sent at a specified system time by using the {\tt + set\_timer\_op()} hypercall. Guest OSes may use this timer to +implement timeout values when they block. + + + +%% % akw: demoting this to a section -- not sure if there is any point +%% % though, maybe just remove it. + +\section{Xen CPU Scheduling} + +Xen offers a uniform API for CPU schedulers. It is possible to choose +from a number of schedulers at boot and it should be easy to add more. +The BVT, Atropos and Round Robin schedulers are part of the normal Xen +distribution. BVT provides proportional fair shares of the CPU to the +running domains. Atropos can be used to reserve absolute shares of +the CPU for each domain. Round-robin is provided as an example of +Xen's internal scheduler API. + +\paragraph*{Note: SMP host support} +Xen has always supported SMP host systems. Domains are statically +assigned to CPUs, either at creation time or when manually pinning to +a particular CPU. The current schedulers then run locally on each CPU +to decide which of the assigned domains should be run there. The +user-level control software can be used to perform coarse-grain +load-balancing between CPUs. + + +%% More information on the characteristics and use of these schedulers +%% is available in {\tt Sched-HOWTO.txt}. + + +\section{Privileged operations} + +Xen exports an extended interface to privileged domains (viz.\ {\it + Domain 0}). This allows such domains to build and boot other domains +on the server, and provides control interfaces for managing +scheduling, memory, networking, and block devices.