WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation

To: Zachary Amsden <zach@xxxxxxxxxx>
Subject: [Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation
From: Chris Wright <chrisw@xxxxxxxxxxxx>
Date: Mon, 13 Mar 2006 14:49:02 -0800
Cc: Andrew Morton <akpm@xxxxxxxx>, Joshua LeVasseur <jtl@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Pratap Subrahmanyam <pratap@xxxxxxxxxx>, Wim Coekaerts <wim.coekaerts@xxxxxxxxxx>, Jack Lo <jlo@xxxxxxxxxx>, Dan Hecht <dhecht@xxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxxxx>, Christopher Li <chrisl@xxxxxxxxxx>, Chris Wright <chrisw@xxxxxxxxxxxx>, Virtualization Mailing List <virtualization@xxxxxxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxx>, Anne Holler <anne@xxxxxxxxxx>, Jyothy Reddy <jreddy@xxxxxxxxxx>, Kip Macy <kmacy@xxxxxxxxxxx>, Ky Srinivasan <ksrinivasan@xxxxxxxxxx>, Leendert van Doorn <leendert@xxxxxxxxxxxxxx>, Dan Arai <arai@xxxxxxxxxx>
Delivery-date: Tue, 14 Mar 2006 10:04:31 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <200603131759.k2DHxeep005627@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <200603131759.k2DHxeep005627@xxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.4.2.1i
* Zachary Amsden (zach@xxxxxxxxxx) wrote:

Thanks for the very complete Documentation!  Some comments interspersed
below.

> +  High Performance.
> +
> +     Providing a low level API that closely resembles hardware does not
> +     provide any support for compound operations; indeed, typical
> +     compound operations on hardware can be updating of many page table
> +     entries, flushing system TLBs, or providing floating point safety.
> +     Since these operations may require several privileged or sensitive
> +     operations, it becomes important to defer some of these operations
> +     until explicit flushes are issued, or to provide higher level
> +     operations around some of these functions.  In order to keep with
> +     the goal of portability, this has been done only when deemed
> +     necessary for performance reasons, and we have tried to package
> +     these compound operations into methods that are typically used in
> +     guest operating systems.  In the future, we envision that additional
> +     higher level abstractions will be added as an adjunct to the
> +     low-level API.  These higher level abstractions will target large
> +     bulk operations such as creation, and destruction of address spaces,
> +     context switches, thread creation and control.

This is an area where in the past VMI hasn't been well-suited to support
Xen.  It's the higher level abstractions which make the performance
story of paravirt compelling.  I haven't made it through the whole
patchset yet, but the bits you mention above as work to be done are
certainly important to good performance.

> +  Maintainability.
> +
> +     In the course of development with a virtualized environment, it is
> +     not uncommon for support of new features or higher performance to
> +     require radical changes to the operation of the system.  If these
> +     changes are visible to the guest OS in a paravirtualized system,
> +     this will require updates to the guest kernel, which presents a
> +     maintenance problem.  In the Linux world, the rapid pace of
> +     development on the kernel means new kernel versions are produced
> +     every few months.  This rapid pace is not always appropriate for end
> +     users, so it is not uncommon to have dozens of different versions of
> +     the Linux kernel in use that must be actively supported.

We do not want an interface which slows down the pace.  We work with
source and drop cruft as quickly as possible (referring to internal
changes, not user-visible ABI changes here).  Making changes that
require a new guest for some significant performance gain is perfectly
reasonable.  What we want to avoid is making changes that require a
new guest to simply boot.  This is akin to rev'ing hardware w/out any
backwards compatibility.  This goal doesn't require VMI and ROMs, but
I agree it requires clear interface definitions.

> +   Privilege Model.
> +     Currently, the system only provides for two guest security domains,
> +     kernel (which runs at the equivalent of virtual CPL-0), and user
> +     (which runs at the equivalent of virtual CPL-3, with no hardware
> +     access).  Typically, this is not a problem, but if a guest OS relies
> +     on using multiple hardware rings for privilege isolation, this
> +     interface would need to be expanded to support that.

I don't think this is an issue, but good to have noted down.

> +     The guest OS is also responsible for notifying the hypervisor about
> +     which pages in its physical memory are going to be used to hold page
> +     tables or page directories.  Both PAE and non-PAE paging modes are
> +     supported.

Presumably simultaneously, so single ROM supports PAE and non-PAE guests?
So VMI has PAE specific bits of the interface?

> +     An experimental patch is available to enable boot-time sizing of
> +     the hypervisor hole.

It'll be nice to have it eventually be dynamic.

> +   Interrupt and I/O Subsystem.
> +
> +     For security reasons, the guest operating system is not given
> +     control over the hardware interrupt flag.  We provide a virtual
> +     interrupt flag that is under guest control.  The virtual operating
> +     system always runs with hardware interrupts enabled, but hardware
> +     interrupts are transparent to the guest.  The API provides calls for
> +     all instructions which modify the interrupt flag.
> +
> +     The paravirtualization environment provides a legacy programmable
> +     interrupt controller (PIC) to the virtual machine.  Future releases
> +     will provide a virtual interrupt controller (VIC) that provides
> +     more advanced features.

VIC is then just a formalized event mechanism between guest and VMM?

> +     The general mechanism for providing customized features and
> +     capabilities is to provide notification of these feature through
> +     the CPUID call, and allowing configuration of CPU features
> +     through RDMSR / WRMSR instructions.  This allows a hypervisor vendor
> +     ID to be published, and the kernel may enable or disable specific
> +     features based on this id.  This has the advantage of following
> +     closely the boot time logic of many operating systems that enables
> +     certain performance enhancements or bugfixes based on processor
> +     revision, using exactly the same mechanism.

I like this idea, there's been a couple times when it seemed the simplest
way to handle some Xen features, but it's absolutely ripe for basically
unmanaged interface changes.

> +     One shortcut we have found most helpful is to simply disable NMI 
> delivery
> +     to the paravirtualized kernel.  There is no reason NMIs can't be
> +     supported, but typical uses for them are not as productive in a
> +     virtualized environment.  Watchdog NMIs are of limited use if the OS is
> +     already correct and running on stable hardware; profiling NMIs are
> +     similarly of less use, since this task is accomplished with more 
> accuracy
> +     in the VMM itself; and NMIs for machine check errors should be handled
> +     outside of the VM.  The addition of NMI support does create additional
> +     complexity for the trap handling code in the VM, and although the task 
> is
> +     surmountable, the value proposition is debatable.  Here, again, feedback
> +     is desired.

Xen allows propagating NMI's to the privileged dom0.  This may make
sense for some errors that aren't fatal, but I'm not sure how much it's
used.

> +     Alarms:
> +
> +     Alarms can be set (armed) against the real time counter or the
> +     available time counter. Alarms can be programmed to expire once
> +     (one-shot) or on a regular period (periodic).  They are armed by
> +     indicating an absolute counter value expiry, and in the case of a
> +     periodic alarm, a non-zero relative period counter value.  [TBD:
> +     The method of wiring the alarms to an interrupt vector is dependent
> +     upon the virtual interrupt controller portion of the interface.
> +     Currently, the alarms may be wired as if they are attached to IRQ0
> +     or the vector in the local APIC LVTT.  This way, the alarms can be
> +     used as drop in replacements for the PIT or local APIC timer.]

Hmm, makes me wonder what you do in the case of giving physical
access to hardware.  Xen makes a distinction between irq types of
physical and virtual, and the timer is virtual.  I guess VIC is an area
that warrants more discussion.

> +      typedef struct HyperRomHeader {
> +         uint16_t        romSignature; 
> +         int8_t          romLength;
> +         unsigned char   romEntry[4];
> +         uint8_t         romPad0;
> +         uint32_t        hyperSignature;
> +         uint8_t         APIVersionMinor;
> +         uint8_t         APIVersionMajor;
> +         uint8_t         reserved0;
> +         uint8_t         reserved1;
> +         uint32_t        reserved2;
> +         uint32_t        reserved3;
> +         uint16_t        pciHeaderOffset;
> +         uint16_t        pnpHeaderOffset;
> +         uint32_t        romPad3;
> +         char            reserved[32];
> +         char            elfHeader[64];
> +      } HyperRomHeader;

As a general rule, all these typedef'd structs and StudlyCaps don't
complement Linux CodingStyle.

> +    VMI_Init
> +   
> +       VMICALL void VMI_Init(void);
> +
> +       Initializes the hypervisor environment.  Returns zero on success,
> +       or -1 if the hypervisor could not be initialized.  Note that this
> +       is a recoverable error if the guest provides the requisite native
> +       code to support transparent paravirtualization.

This provides an interesting support issue, i.e. just what platform are
you runnnig on?

> +       Inputs:      None
> +       Outputs:     EAX = result
> +       Clobbers:    Standard
> +       Segments:    Standard
> +
> +
> +   PROCESSOR STATE CALLS
> +
> +    This set of calls controls the online status of the processor.  It
> +    include interrupt control, reboot, halt, and shutdown functionality.
> +    Future expansions may include deep sleep and hotplug CPU capabilities.
> +
> +    VMI_DisableInterrupts
> +
> +       VMICALL void VMI_DisableInterrupts(void);
> +
> +       Disable maskable interrupts on the processor.
> +
> +       Inputs:      None
> +       Outputs:     None
> +       Clobbers:    Flags only
> +       Segments:    As this is both performance critical and likely to
> +          be called from low level interrupt code, this call does not
> +          require flat DS/ES segments, but uses the stack segment for
> +          data access.  Therefore only CS/SS must be well defined.
> +
> +    VMI_EnableInterrupts
> +
> +       VMICALL void VMI_EnableInterrupts(void);
> +
> +       Enable maskable interrupts on the processor.  Note that the
> +       current implementation always will deliver any pending interrupts
> +       on a call which enables interrupts, for compatibility with kernel
> +       code which expects this behavior.  Whether this should be required
> +       is open for debate.
> +
> +       Inputs:      None
> +       Outputs:     None
> +       Clobbers:    Flags only
> +       Segments:    CS/SS only
> +
> +    VMI_GetInterruptMask
> +
> +       VMICALL VMI_UINT VMI_GetInterruptMask(void);
> +
> +       Returns the current interrupt state mask of the processor.  The
> +       mask is defined to be 0x200 (matching processor flag IF) to indicate
> +       interrupts are enabled.
> +
> +       Inputs:      None
> +       Outputs:     EAX = mask
> +       Clobbers:    Flags only
> +       Segments:    CS/SS only
> +
> +    VMI_SetInterruptMask
> +   
> +       VMICALL void VMI_SetInterruptMask(VMI_UINT mask);
> +
> +       Set the current interrupt state mask of the processor.  Also
> +       delivers any pending interrupts if the mask is set to allow
> +       them.
> +
> +       Inputs:      EAX = mask
> +       Outputs:     None
> +       Clobbers:    Flags only
> +       Segments:    CS/SS only
> +
> +    VMI_DeliverInterrupts (For future debate)
> +
> +       Enable and deliver any pending interrupts.  This would remove
> +       the implicit delivery semantic from the SetInterruptMask and
> +       EnableInterrupts calls.

How do you keep forwards and backwards compat here?  Guest that's coded
to do simple implicit version would never get interrupts delivered on
newer ROM?

> +   CPU CONTROL CALLS
> +
> +    These calls encapsulate the set of privileged instructions used to
> +    manipulate the CPU control state.  These instructions are all properly
> +    virtualizable using trap and emulate, but for performance reasons, a
> +    direct call may be more efficient.  With hardware virtualization
> +    capabilities, many of these calls can be left as IDENT translations, that
> +    is, inline implementations of the native instructions, which are not
> +    rewritten by the hypervisor.  Some of these calls are performance 
> critical
> +    during context switch paths, and some are not, but they are all included
> +    for completeness, with the exceptions of the obsoleted LMSW and SMSW
> +    instructions.

Included just for completeness can be beginning of API bloat.

> +    VMI_WRMSR
> +
> +       VMICALL void VMI_WRMSR(VMI_UINT64 val, VMI_UINT32 reg);
> +
> +       Write to a model specific register.  This functions identically to the
> +       hardware WRMSR instruction.  Note that a hypervisor may not implement
> +       the full set of MSRs supported by native hardware, since many of them
> +       are not useful in the context of a virtual machine.
> +
> +       Inputs:      ECX = model specific register index 
> +                    EAX = low word of register
> +                    EDX = high word of register
> +       Outputs:     None
> +       Clobbers:    Standard, Memory
> +       Segments:    Standard
> +
> +    VMI_RDMSR
> +
> +       VMICALL VMI_UINT64 VMI_RDMSR(VMI_UINT64 dummy, VMI_UINT32 reg);
> +
> +       Read from a model specific register.  This functions identically to 
> the
> +       hardware RDMSR instruction.  Note that a hypervisor may not implement
> +       the full set of MSRs supported by native hardware, since many of them
> +       are not useful in the context of a virtual machine.
> +
> +       Inputs:      ECX = machine specific register index 
> +       Outputs:     EAX = low word of register
> +                    EDX = high word of register
> +       Clobbers:    Standard
> +       Segments:    Standard
> +
> +    VMI_SetCR0
> +
> +       VMICALL void VMI_SetCR0(VMI_UINT val);
> +
> +       Write to control register zero.  This can cause TLB flush and FPU
> +       handling side effects.  The set of features available to the kernel
> +       depend on the completeness of the hypervisor.  An explicit list of
> +       supported functionality or required settings may need to be negotiated
> +       by the hypervisor and kernel during bootstrapping.  This is likely to
> +       be implementation or vendor specific, and the precise restrictions are
> +       not yet worked out.  Our implementation in general supports turning on
> +       additional functionality - enabling protected mode, paging, page write
> +       protections; however, once those features have been enabled, they may
> +       not be disabled on the virtual hardware.
> +
> +       Inputs:      EAX = input to control register
> +       Outputs:     None
> +       Clobbers:    Standard
> +       Segments:    Standard

clts, setcr0, readcr0 are interrelated for typical use.  is it expected
the hypervisor uses consitent regsister (either native or shadowed)
here, or is it meant to be undefined?

> +    VMI_INVD
> +
> +       This instruction is deprecated.  It is invalid to execute in a virtual
> +       machine.  It is documented here only because it is still declared in
> +       the interface, and dropping it required a version change.

Rev the version, no need to discuss deprecated interface ;-) Good example
of how this has the ability to carry bloat forward though.

> +   MMU CALLS

Many of these will look the same on x86-64, but the API is not
64-bit clean so has to be duplicated.

> +    The MMU plays a large role in paravirtualization due to the large
> +    performance opportunities realized by gaining insight into the guest
> +    machine's use of page tables.  These calls are designed to accommodate 
> the
> +    existing MMU functionality in the guest OS while providing the hypervisor
> +    with hints that can be used to optimize performance to a large degree.
> +
> +    VMI_SetLinearMapping
> +       VMICALL void VMI_SetLinearMapping(int slot, VMI_UINT32 va,
> +                                         VMI_UINT32 pages, VMI_UINT32 ppn);
> +
> +       /* The number of VMI address translation slot */
> +       #define VMI_LINEAR_MAP_SLOTS    4
> +
> +       Register a virtual to physical translation of virtual address range to
> +       physical pages.  This may be used to register single pages or to
> +       register large ranges.  There is an upper limit on the number of 
> active
> +       mappings, which should be sufficient to allow the hypervisor and VMI
> +       layer to perform page translation without requiring dynamic storage.
> +       Translations are only required to be registered for addresses used to
> +       access page table entries through the VMI page table access functions.
> +       The guest is free to use the provided linear map slots in a manner 
> that
> +       it finds most convenient.  Kernels which linearly map a large chunk of
> +       physical memory and use page tables in this linear region will only
> +       need to register one such region after initialization of the VMI.
> +       Hypervisors which do not require linear to physical conversion hints
> +       are free to leave these calls as NOPs, which is the default when
> +       inlined into the native kernel.
> +
> +       Inputs:      EAX   = linear map slot
> +                    EDX   = virtual address start of mapping
> +                    ECX   = number of pages in mapping
> +                    ST(0) = physical frame number to which pages are mapped
> +       Outputs:     None
> +       Clobbers:    Standard
> +       Segments:    Standard
> +
> +    VMI_FlushTLB
> +
> +       VMICALL void VMI_FlushTLB(int how);
> +   
> +       Flush all non-global mappings in the TLB, optionally flushing global
> +       mappings as well.  The VMI_FLUSH_TLB flag should always be specified,
> +       optionally or'ed with the VMI_FLUSH_GLOBAL flag.
> +
> +       Inputs:      EAX = flush type
> +                       #define VMI_FLUSH_TLB            0x01
> +                       #define VMI_FLUSH_GLOBAL         0x02
> +       Outputs:     None
> +       Clobbers:    Standard, memory (implied)
> +       Segments:    Standard
> +
> +    VMI_InvalPage
> +
> +       VMICALL void VMI_InvalPage(VMI_UINT32 va);
> +
> +       Invalidate the TLB mapping for a single page or large page at the
> +       given virtual address.
> +
> +       Inputs:      EAX = virtual address
> +       Outputs:     None
> +       Clobbers:    Standard, memory (implied)
> +       Segments:    Standard
> +
> +   The remaining documentation here needs updating when the PTE accessors are
> +   simplified.
> +
> +    70) VMI_SetPte
> +
> +        void VMI_SetPte(VMI_PTE pte, VMI_PTE *ptep);
> +
> +        Assigns a new value to a page table / directory entry. It is a
> +        requirement that ptep points to a page that has already been
> +        registered with the hypervisor as a page of the appropriate type
> +     using the VMI_RegisterPageUsage function.
> +            
> +    71) VMI_SwapPte           
> +
> +        VMI_PTE VMI_SwapPte(VMI_PTE pte, VMI_PTE *ptep);
> +
> +        Write 'pte' into the page table entry pointed by 'ptep', and returns
> +        the old value in 'ptep'.  This function acts atomically on the PTE
> +        to provide up to date A/D bit information in the returned value.
> +
> +    72) VMI_TestAndSetPteBit
> +
> +        VMI_BOOL VMI_TestAndSetPteBit(VMI_INT bit, VMI_PTE *ptep);
> +
> +        Atomically set a bit in a page table entry.  Returns zero if the bit
> +        was not set, and non-zero if the bit was set.
> +
> +    73) VMI_TestAndClearPteBit 
> +
> +        VMI_BOOL VMI_TestAndSetClearBit(VMI_INT bit, VMI_PTE *ptep);
> +
> +        Atomically clear a bit in a page table entry.  Returns zero if the 
> bit
> +        was not set, and non-zero if the bit was set.
> +
> +    74) VMI_SetPteLong
> +    75) VMI_SwapPteLong           
> +    76) VMI_TestAndSetPteBitLong
> +    77) VMI_TestAndClearPteBitLong
> +
> +        void VMI_SetPteLong(VMI_PAE_PTE pte, VMI_PAE_PTE *ptep);
> +        VMI_PAE_PTE VMI_SwapPteLong(VMI_UINT64 pte, VMI_PAE_PTE *ptep);
> +        VMI_BOOL VMI_TestAndSetPteBitLong(VMI_INT bit, VMI_PAE_PTE *ptep);
> +        VMI_BOOL VMI_TestAndSetClearBitLong(VMI_INT bit, VMI_PAE_PTE *ptep);
> +        
> +        These functions act identically to the 32-bit PTE update functions,
> +        but provide support for PAE mode.  The calls are guaranteed to never
> +        create a temporarily invalid but present page mapping that could be
> +        accidentally prefetched by another processor, and all returned bits
> +        are guaranteed to be atomically up to date.

Heh, answers that question I had above ;-)

> +    85) VMI_SetDeferredMode

Is this the batching, multi-call analog?

> +        void VMI_SetDeferredMode(VMI_UINT32 deferBits); 
> +
> +        Set the lazy state update mode to the specified set of bits.  This
> +        allows the processor, hypervisor, or VMI layer to lazily update
> +        certain CPU and MMU state.  When setting this to a more permissive
> +        setting, no flush is implied, but when clearing bits in the current
> +        defer mask, all pending state will be flushed.
> +
> +        The 'deferBits' is a mask specifying how to flush.
> +
> +            #define VMI_DEFER_NONE          0x00
> +
> +        Disallow all asynchronous state updates.  This is the default
> +        state.
> +
> +            #define VMI_DEFER_MMU           0x01
> +
> +     Flush all pending page table updates.  Note that page faults,
> +        invalidations and TLB flushes will implicitly flush all pending
> +        updates. 
> +
> +            #define VMI_DEFER_CPU           0x02
> +
> +        Allow CPU state updates to control registers to be deferred, with
> +        the exception of updates that change FPU state.  This is useful
> +        for combining a reload of the page table base in CR3 with other
> +        updates, such as the current kernel stack.
> +
> +            #define VMI_DEFER_DT            0x04
> +
> +        Allow descriptor table updates to be delayed.  This allows the
> +        VMI_UpdateGDT / IDT / LDT calls to be asynchronously queued.
> +
> +    86) VMI_FlushDeferredCalls
> +
> +        void VMI_FlushDeferredCalls(void);
> +
> +        Flush all asynchronous state updates which may be queued as
> +        a result of setting deferred update mode.
> +
> +
> +Appendix B - VMI C prototypes
> +
> +   Most of the VMI calls are properly callable C functions.  Note that for 
> the
> +   absolute best performance, assembly calls are preferable in some cases, as
> +   they do not imply all of the side effects of a C function call, such as
> +   register clobber and memory access.  Nevertheless, these wrappers serve as
> +   a useful interface definition for higher level languages.
> +
> +   In some cases, a dummy variable is passed as an unused input to force
> +   proper alignment of the remaining register values.
> +
> +   The call convention for these is defined to be standard GCC convention 
> with
> +   register passing.  The regparm call interface is documented at:
> +
> +   http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
> +
> +   Types used by these calls:
> +
> +   VMI_UINT64   64 bit unsigned integer
> +   VMI_UINT32   32 bit unsigned integer
> +   VMI_UINT16   16 bit unsigned integer
> +   VMI_UINT8    8 bit unsigned integer
> +   VMI_INT      32 bit integer
> +   VMI_UINT     32 bit unsigned integer
> +   VMI_DTR      6 byte compressed descriptor table limit/base
> +   VMI_PTE      4 byte page table entry (or page directory)
> +   VMI_LONG_PTE 8 byte page table entry (or PDE or PDPE)
> +   VMI_SELECTOR 16 bit segment selector
> +   VMI_BOOL     32 bit unsigned integer
> +   VMI_CYCLES   64 bit unsigned integer
> +   VMI_NANOSECS 64 bit unsigned integer

All caps typedefs are not very popular w.r.t. CodingStyle.

> +   #ifndef VMI_PROTOTYPES_H
> +   #define VMI_PROTOTYPES_H
> +
> +   /* Insert local type definitions here */
> +   typedef struct VMI_DTR {
> +      uint16 limit;
> +      uint32 offset __attribute__ ((packed));
> +   } VMI_DTR;
> +
> +   typedef struct APState {
> +      VMI_UINT32 cr0;
> +      VMI_UINT32 cr2;
> +      VMI_UINT32 cr3;
> +      VMI_UINT32 cr4;
> +
> +      VMI_UINT64 efer;
> +
> +      VMI_UINT32 eip;
> +      VMI_UINT32 eflags;
> +      VMI_UINT32 eax;
> +      VMI_UINT32 ebx;
> +      VMI_UINT32 ecx;
> +      VMI_UINT32 edx;
> +      VMI_UINT32 esp;
> +      VMI_UINT32 ebp;
> +      VMI_UINT32 esi;
> +      VMI_UINT32 edi;
> +      VMI_UINT16 cs;
> +      VMI_UINT16 ss;
> +
> +      VMI_UINT16 ds;
> +      VMI_UINT16 es;
> +      VMI_UINT16 fs;
> +      VMI_UINT16 gs;
> +      VMI_UINT16 ldtr;
> +
> +      VMI_UINT16 gdtrLimit;
> +      VMI_UINT32 gdtrBase;
> +      VMI_UINT32 idtrBase;
> +      VMI_UINT16 idtrLimit;
> +   } APState;
> +
> +   #define VMICALL __attribute__((regparm(3)))

I understand it's for ABI documentation, but in Linux it's FASTCALL.

> +   /* CORE INTERFACE CALLS */
> +   VMICALL void VMI_Init(void);
> +
> +   /* PROCESSOR STATE CALLS */
> +   VMICALL void     VMI_DisableInterrupts(void);
> +   VMICALL void     VMI_EnableInterrupts(void);
> +
> +   VMICALL VMI_UINT VMI_GetInterruptMask(void);
> +   VMICALL void     VMI_SetInterruptMask(VMI_UINT mask);
> +
> +   VMICALL void     VMI_Pause(void);
> +   VMICALL void     VMI_Halt(void);
> +   VMICALL void     VMI_Shutdown(void);
> +   VMICALL void     VMI_Reboot(VMI_INT how);
> +
> +   #define VMI_REBOOT_SOFT 0x0
> +   #define VMI_REBOOT_HARD 0x1
> +
> +   void VMI_SetInitialAPState(APState *apState, VMI_UINT32 apicID);
> +
> +   /* DESCRIPTOR RELATED CALLS */
> +   VMICALL void         VMI_SetGDT(VMI_DTR *gdtr);
> +   VMICALL void         VMI_SetIDT(VMI_DTR *idtr);
> +   VMICALL void         VMI_SetLDT(VMI_SELECTOR ldtSel);
> +   VMICALL void         VMI_SetTR(VMI_SELECTOR ldtSel);
> +
> +   VMICALL void         VMI_GetGDT(VMI_DTR *gdtr);
> +   VMICALL void         VMI_GetIDT(VMI_DTR *idtr);
> +   VMICALL VMI_SELECTOR VMI_GetLDT(void);
> +   VMICALL VMI_SELECTOR VMI_GetTR(void);
> +
> +   VMICALL void         VMI_WriteGDTEntry(void *gdt,
> +                                          VMI_UINT entry,
> +                                          VMI_UINT32 descLo,
> +                                          VMI_UINT32 descHi);
> +   VMICALL void         VMI_WriteLDTEntry(void *gdt,
> +                                          VMI_UINT entry,
> +                                          VMI_UINT32 descLo,
> +                                          VMI_UINT32 descHi);
> +   VMICALL void         VMI_WriteIDTEntry(void *gdt,
> +                                          VMI_UINT entry,
> +                                          VMI_UINT32 descLo,
> +                                          VMI_UINT32 descHi);
> +
> +   /* CPU CONTROL CALLS */
> +   VMICALL void       VMI_WRMSR(VMI_UINT64 val, VMI_UINT32 reg);
> +   VMICALL void       VMI_WRMSR_SPLIT(VMI_UINT32 valLo, VMI_UINT32 valHi,
> +                                      VMI_UINT32 reg);
> +
> +   /* Not truly a proper C function; use dummy to align reg in ECX */
> +   VMICALL VMI_UINT64 VMI_RDMSR(VMI_UINT64 dummy, VMI_UINT32 reg);
> +
> +   VMICALL void VMI_SetCR0(VMI_UINT val);
> +   VMICALL void VMI_SetCR2(VMI_UINT val);
> +   VMICALL void VMI_SetCR3(VMI_UINT val);
> +   VMICALL void VMI_SetCR4(VMI_UINT val);
> +
> +   VMICALL VMI_UINT32 VMI_GetCR0(void);
> +   VMICALL VMI_UINT32 VMI_GetCR2(void);
> +   VMICALL VMI_UINT32 VMI_GetCR3(void);
> +   VMICALL VMI_UINT32 VMI_GetCR4(void);
> +
> +   VMICALL void       VMI_CLTS(void);
> +
> +   VMICALL void       VMI_SetDR(VMI_UINT32 num, VMI_UINT32 val);
> +   VMICALL VMI_UINT32 VMI_GetDR(VMI_UINT32 num);
> +
> +   /* PROCESSOR INFORMATION CALLS */
> +
> +   VMICALL VMI_UINT64 VMI_RDTSC(void);
> +   VMICALL VMI_UINT64 VMI_RDPMC(VMI_UINT64 dummy, VMI_UINT32 counter);
> +
> +   /* STACK / PRIVILEGE TRANSITION CALLS */
> +   VMICALL void VMI_UpdateKernelStack(void *tss, VMI_UINT32 esp0);
> +
> +   /* I/O CALLS */
> +   /* Native port in EDX - use dummy */
> +   VMICALL VMI_UINT8  VMI_INB(VMI_UINT dummy, VMI_UINT port);
> +   VMICALL VMI_UINT16 VMI_INW(VMI_UINT dummy, VMI_UINT port);
> +   VMICALL VMI_UINT32 VMI_INL(VMI_UINT dummy, VMI_UINT port);
> +
> +   VMICALL void VMI_OUTB(VMI_UINT value, VMI_UINT port);
> +   VMICALL void VMI_OUTW(VMI_UINT value, VMI_UINT port);
> +   VMICALL void VMI_OUTL(VMI_UINT value, VMI_UINT port);
> +
> +   VMICALL void VMI_IODelay(void);
> +   VMICALL void VMI_WBINVD(void);
> +   VMICALL void VMI_SetIOPLMask(VMI_UINT32 mask);
> +
> +   /* APIC CALLS */
> +   VMICALL void       VMI_APICWrite(void *reg, VMI_UINT32 value);
> +   VMICALL VMI_UINT32 VMI_APICRead(void *reg);
> +
> +   /* TIMER CALLS */
> +   VMICALL VMI_NANOSECS VMI_GetWallclockTime(void);
> +   VMICALL VMI_BOOL     VMI_WallclockUpdated(void);
> +
> +   /* Predefined rate of the wallclock. */
> +   #define VMI_WALLCLOCK_HZ       1000000000
> +
> +   VMICALL VMI_CYCLES VMI_GetCycleFrequency(void);
> +   VMICALL VMI_CYCLES VMI_GetCycleCounter(VMI_UINT32 whichCounter);
> +
> +   /* Defined cycle counters */
> +   #define VMI_CYCLES_REAL        0
> +   #define VMI_CYCLES_AVAILABLE   1
> +   #define VMI_CYCLES_STOLEN      2
> +
> +   VMICALL void     VMI_SetAlarm(VMI_UINT32 flags, VMI_CYCLES expiry,
> +                                 VMI_CYCLES period);
> +   VMICALL VMI_BOOL VMI_CancelAlarm(VMI_UINT32 flags);
> +
> +   /* The alarm interface 'flags' bits. [TBD: exact format of 'flags'] */
> +   #define VMI_ALARM_COUNTER_MASK 0x000000ff
> +
> +   #define VMI_ALARM_WIRED_IRQ0   0x00000000
> +   #define VMI_ALARM_WIRED_LVTT   0x00010000
> +
> +   #define VMI_ALARM_IS_ONESHOT   0x00000000
> +   #define VMI_ALARM_IS_PERIODIC  0x00000100
> +
> +   /* MMU CALLS */
> +   VMICALL void VMI_SetLinearMapping(int slot, VMI_UINT32 va,
> +                                     VMI_UINT32 pages, VMI_UINT32 ppn);
> +
> +   /* The number of VMI address translation slot */
> +   #define VMI_LINEAR_MAP_SLOTS    4
> +
> +   VMICALL void VMI_InvalPage(VMI_UINT32 va);
> +   VMICALL void VMI_FlushTLB(int how);
> +   
> +   /* Flags used by VMI_FlushTLB call */
> +   #define VMI_FLUSH_TLB            0x01
> +   #define VMI_FLUSH_GLOBAL         0x02
> +
> +   #endif

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel