[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
From: Ben Catterall <Ben.Catterall@xxxxxxxxxx>
Date: Fri, 7 Aug 2015 13:51:02 +0100
Cc: george.dunlap@xxxxxxxxxxxxx, tim@xxxxxxx, keir@xxxxxxx, ian.campbell@xxxxxxxxxx, jbeulich@xxxxxxxx
Delivery-date: Fri, 07 Aug 2015 12:51:21 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>



On 06/08/15 21:55, Andrew Cooper wrote:

On 06/08/15 17:45, Ben Catterall wrote:

The process to switch into and out of deprivileged mode can be likened to
setjmp/longjmp.

To enter deprivileged mode, we take a copy of the stack from the guest's
registers up to the current stack pointer. This allows us to restore the stack
when we have finished the deprivileged mode operation, meaning we can continue
execution from that point. This is similar to if a context switch had happened.

To exit deprivileged mode, we copy the stack back, replacing the current stack.
We can then continue execution from where we left off, which will unwind the
stack and free up resources. This method means that we do not need to
change any other code paths and its invocation will be transparent to callers.
This should allow the feature to be more easily deployed to different parts
of Xen.

Note that this copy of the stack is per-vcpu but, it will contain per-pcpu data.
Extra work is needed to properly migrate vcpus between pcpus.


Under what circumstances do you see there being persistent state in the
depriv area between calls, given that the calls are synchronous from VM
actions?

I don't know if we can make these synchronous as we need a way tointerrupt the vcpu if it's spinning for a long time. Otherwise anattacker could just spin in depriv and cause a DoS. With that in mind,the scheduler may decide to migrate the vcpu whilst it's in depriv modewhich would mean this per-pcpu data is held in the stack copy which isthen migrated to another pcpu incorrectly.


The switch to and from deprivileged mode is performed using sysret and syscall
respectively.


I suspect we need to borrow the SS attribute workaround from Linux to
make this function reliably on AMD systems.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61f01dd941ba9e06d2bf05994450ecc3d61b6b8b

>
Ah! ok, I'll look into this. Thanks!


The return paths in entry.S have been edited so that, when we receive an
interrupt whilst in deprivileged mode, we return into that mode correctly.

A hook on the syscall handler in entry.S has also been added which handles
returning from user mode and will support deprivileged mode system calls when
these are needed.

Signed-off-by: Ben Catterall <Ben.Catterall@xxxxxxxxxx>
---
  xen/arch/x86/domain.c               |  12 +++
  xen/arch/x86/hvm/Makefile           |   1 +
  xen/arch/x86/hvm/deprivileged.c     | 103 ++++++++++++++++++
  xen/arch/x86/hvm/deprivileged_asm.S | 205 ++++++++++++++++++++++++++++++++++++
  xen/arch/x86/hvm/vmx/vmx.c          |   7 ++
  xen/arch/x86/x86_64/asm-offsets.c   |   5 +
  xen/arch/x86/x86_64/entry.S         |  35 ++++++
  xen/include/asm-x86/hvm/vmx/vmx.h   |   2 +
  xen/include/xen/hvm/deprivileged.h  |  38 +++++++
  xen/include/xen/sched.h             |  18 +++-
  10 files changed, 425 insertions(+), 1 deletion(-)
  create mode 100644 xen/arch/x86/hvm/deprivileged_asm.S

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..a0e5e70 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -62,6 +62,7 @@
  #include <xen/iommu.h>
  #include <compat/vcpu.h>
  #include <asm/psr.h>
+#include <xen/hvm/deprivileged.h>

  DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
  DEFINE_PER_CPU(unsigned long, cr4);
@@ -446,6 +447,12 @@ int vcpu_initialise(struct vcpu *v)
      if ( has_hvm_container_domain(d) )
      {
          rc = hvm_vcpu_initialise(v);
+
+        /* Initialise HVM deprivileged mode */
+        printk("HVM initialising deprivileged mode ...");


All printk()s should have a XENLOG_$severity prefix.

will do.

+        hvm_deprivileged_prepare_vcpu(v);
+        printk("Done.\n");
+
          goto done;
      }

@@ -523,7 +530,12 @@ void vcpu_destroy(struct vcpu *v)
      vcpu_destroy_fpu(v);

      if ( has_hvm_container_vcpu(v) )
+    {
+        /* Destroy the deprivileged mode on this vcpu */
+        hvm_deprivileged_destroy_vcpu(v);
+
          hvm_vcpu_destroy(v);
+    }
      else
          xfree(v->arch.pv_vcpu.trap_ctxt);
  }
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index bd83ba3..6819886 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -17,6 +17,7 @@ obj-y += quirks.o
  obj-y += rtc.o
  obj-y += save.o
  obj-y += deprivileged.o
+obj-y += deprivileged_asm.o
  obj-y += stdvga.o
  obj-y += vioapic.o
  obj-y += viridian.o
diff --git a/xen/arch/x86/hvm/deprivileged.c b/xen/arch/x86/hvm/deprivileged.c
index 071d900..979fc69 100644
--- a/xen/arch/x86/hvm/deprivileged.c
+++ b/xen/arch/x86/hvm/deprivileged.c
@@ -439,3 +439,106 @@ int hvm_deprivileged_copy_l1(struct domain *d,
      }
      return 0;
  }
+
+/* Used to prepare each vcpus data for user mode. Call for each HVM vcpu.
+ */
+int hvm_deprivileged_prepare_vcpu(struct vcpu *vcpu)
+{
+    struct page_info *pg;
+
+    /* TODO: clarify if this MEMF is correct */
+    /* Allocate 2^STACK_ORDER contiguous pages */
+    pg = alloc_domheap_pages(NULL, STACK_ORDER, MEMF_no_owner);
+    if( pg == NULL )
+    {
+        panic("HVM: Out of memory on per-vcpu deprivileged mode init.\n");
+        return -ENOMEM;
+    }
+
+    vcpu->stack = page_to_virt(pg);


Xen has two heaps, the xenheap and the domheap.

You may only construct pointers like this into the xenheap.  The domheap
is not guaranteed to have safe virtual mappings to.  (This code only
works because your test box isn't bigger than 5TB.  Also there is a bug
with xenheap allocations at the same point, but I need to fix that bug).

All access to domheap pages must strictly be within a
map_domain_page()/unmap() region, which construct save temporary mappings.

ok, I'll add these.

+    vcpu->rsp = 0;
+    vcpu->user_mode = 0;
+
+    return 0;
+}
+
+/* Called on destroying each vcpu */
+void hvm_deprivileged_destroy_vcpu(struct vcpu *vcpu)
+{
+    free_domheap_pages(virt_to_page(vcpu->stack), STACK_ORDER);
+}
+
+/* Called to perform a user mode operation.
+ * Execution context is saved and then we move into user mode.
+ * This method is then jumped into to restore execution context after
+ * exiting user mode.
+ */
+void hvm_deprivileged_user_mode(void)
+{
+    struct vcpu *vcpu = get_current();
+    unsigned long int efer = read_efer();
+    register unsigned long sp asm("rsp");
+
+    ASSERT( vcpu->user_mode == 0 );
+    ASSERT( vcpu->stack != 0 );
+    ASSERT( vcpu->rsp == 0 );
+
+    /* Flip the SCE bit to allow sysret/call */
+    write_efer(efer | EFER_SCE);
+
+    /* Save the msr lstar and star. Xen does lazy loading of these
+     * so we need to put the host values in and then restore the
+     * guest ones once we're done.
+     */
+    rdmsrl(MSR_LSTAR, vcpu->msr_lstar);
+    rdmsrl(MSR_STAR, vcpu->msr_star);
+    wrmsrl(MSR_LSTAR,get_host_msr_state()->msrs[VMX_INDEX_MSR_LSTAR]);
+    wrmsrl(MSR_STAR, get_host_msr_state()->msrs[VMX_INDEX_MSR_STAR]);


A partial context switch like this should be implemented as two new
hvm_ops such as hvm_op.depriv_ctxt_switch_{to,from}()

This allows you to keep the common code clean of vendor specific code.

+
+    /* The assembly routine to handle moving into/out of deprivileged mode */
+    hvm_deprivileged_user_mode_asm();
+
+    /* If our copy failed */
+    if( unlikely(vcpu->rsp == 0) )
+    {
+        gdprintk(XENLOG_ERR, "HVM: Stack too large in %s\n", __FUNCTION__);


__func__ please.  It conforms to C99 whereas __FUNCTION__ is a gnuism.

got it.

+        domain_crash_synchronous();
+    }
+
+    /* Debug info */
+    vcpu->old_msr_lstar = get_host_msr_state()->msrs[VMX_INDEX_MSR_LSTAR];
+    vcpu->old_msr_star = get_host_msr_state()->msrs[VMX_INDEX_MSR_STAR];
+    vcpu->old_rsp = sp;
+    vcpu->old_processor = smp_processor_id();
+
+    /* Restore the efer and saved msr registers */
+    write_efer(efer);
+    wrmsrl(MSR_LSTAR, vcpu->msr_lstar);
+    wrmsrl(MSR_STAR, vcpu->msr_star);
+    vcpu->user_mode = 0;
+    vcpu->rsp = 0;
+}
+
+/* Called when the user mode operation has completed
+ * Perform C-level processing on return pathx
+ */
+void hvm_deprivileged_finish_user_mode(void)
+{
+    /* If we are not returning from user mode: bail */
+    ASSERT(get_current()->user_mode == 1);
+
+    hvm_deprivileged_finish_user_mode_asm();
+}
+
+void hvm_deprivileged_check_trap(const char* func_name)
+{
+    if( current->user_mode == 1 )
+    {
+        printk("HVM Deprivileged Mode: Trap whilst in user mode, %s\n",
+               func_name);
+        domain_crash_synchronous();
+    }
+}
+
+
+
diff --git a/xen/arch/x86/hvm/deprivileged_asm.S 
b/xen/arch/x86/hvm/deprivileged_asm.S
new file mode 100644
index 0000000..00a9e9c
--- /dev/null
+++ b/xen/arch/x86/hvm/deprivileged_asm.S
@@ -0,0 +1,205 @@
+/*
+ * HVM security enhancements assembly code
+ */
+#include <xen/config.h>
+#include <xen/errno.h>
+#include <xen/softirq.h>
+#include <asm/asm_defns.h>
+#include <asm/apicdef.h>
+#include <asm/page.h>
+#include <public/xen.h>
+#include <irq_vectors.h>
+#include <xen/hvm/deprivileged.h>
+
+/* Handles entry into the deprivileged mode and returning from this
+ * mode. This requires copying the current Xen privileged stack across
+ * to a per-vcpu buffer as we need to be able to handle interrupts and
+ * exceptions whilst in this mode. Xen is non-preemptable so our
+ * privileged mode stack would  be clobbered if we did not save it.
+ *
+ * If we are entering deprivileged mode, then we use a sysret to get there.
+ * If we are returning from deprivileged mode, then we need to unwind the stack
+ * so we copy it back over the current stack so that we can return from the
+ * call path where we came in from.
+ *
+ * We're doing sort-of a long jump/set jump with copying to a stack to
+ * preserve it and allow returning code to continue executing from
+ * within this method.
+ */
+ENTRY(hvm_deprivileged_user_mode_asm)
+        /* Save our registers */
+        push   %rax
+        push   %rbx
+        push   %rcx
+        push   %rdx
+        push   %rsi
+        push   %rdi
+        push   %rbp
+        push   %r8
+        push   %r9
+        push   %r10
+        push   %r11
+        push   %r12
+        push   %r13
+        push   %r14
+        push   %r15
+        pushfq
+
+        /* Perform a near call to push rip onto the stack */
+        call   1f
+
+        /* Magic: Add to the stored rip the size of the code between
+         * label 1 and label 2. This allows  us to restart execution at label 
2.
+         */
+1:      addq   $2f-1b, (%rsp)
+
+        GET_CURRENT(%r8)
+        xor    %rsi, %rsi
+
+        /* The following is equivalent to
+         * (get_cpu_info() + sizeof(struct cpu_info))
+         * This gets us to the top of the stack.
+         */
+        GET_STACK_BASE(%rcx)
+        addq   $STACK_SIZE, %rcx
+
+        movq   VCPU_stack(%r8), %rdi
+
+        /* We need copy the current stack across to our buffer
+         * Calculate the number of bytes to copy:
+         * (top of stack - current stack pointer)
+         * NOTE: We must not push any more data onto our stack after this point
+         * as it won't be saved.
+         */
+        sub    %rsp, %rcx
+
+        /* If the stack is too big, we don't do the copy: handled by caller. */
+        cmpq   $STACK_SIZE, %rcx
+        ja     3f
+
+        mov    %rsp, %rsi
+/* USER MODE ENTRY POINT */
+2:
+        /* More magic: If we came here from preparing to go into user mode,


There is a very fine line between magic and gross hack ;)

I havn't quite decided which this is yet, but it certainly is neat, if
rather opaque.

+         * then we copy our current stack to the buffer (the lines above
+         * have setup rsi, rdi and rcx to do this).
+         *
+         * If we came here from user mode, then we movsb to copy from
+         * our buffer into our current stack so that we can continue
+         * execution from the current code point, and return back to the guest
+         * via the path we came in. rsi, rdi and rcx have been setup by the
+         * de-privileged return path for this.
+         */
+        rep    movsb
+        mov    %rsp, %rsi
+
+        GET_CURRENT(%r8)
+        movq   VCPU_user_mode(%r8), %rdx
+        movq   VCPU_rsp(%r8), %rax
+
+        /* If !user_mode  */
+        cmpq   $0, %rdx
+        jne    3f
+        cli
+
+        movabs $HVM_DEPRIVILEGED_TEXT_ADDR, %rcx /* RIP in user mode */
+
+        movq   $0x10200, %r11          /* RFLAGS user mode enable interrupts */


Please use $(X86_FLAGS_IF | X86_FLAGS_MBS) to be more clear which flags
are being set.

will do.

Also, by enabling interrupts, you need some hook to short-circuit the
scheduling softirq.  As it currently stands, a timer interrupt
interrupting depriv mode is liable to swap all your state out from under
you.

We need interrupts to be enabled so that we can prevent a DoS fromdepriv by allowing the scheduler to decide to deschedule it. That's alsowhy we needed some of the return path changes.

+        movq   $1, VCPU_user_mode(%r8) /* Now in user mode */
+        movq   %rsi, VCPU_rsp(%r8)     /* The rsp to restore to */
+
+        /* Stack ptr is set by user mode to avoid race conditions.


What race condition are you referring to?

+         * See Intel manual 2 on the sysret instruction.


As a general rule, read both the Intel and the AMD manual for bits like
this.  sysret is one of the areas where implementations differ.

+         */
+        movq   $HVM_STACK_PTR, %rbx
+        sysretq                         /* Enter deprivileged mode */
+
+3:      GET_CURRENT(%r8)
+        movq   %rsi, VCPU_rsp(%r8)
+        pop    %rax    /* Pop off rip: used in a jump so still on stack */
+
+        /* Restore registers */
+        popfq
+        pop    %r15
+        pop    %r14
+        pop    %r13
+        pop    %r12
+        pop    %r11
+        pop    %r10
+        pop    %r9
+        pop    %r8
+        pop    %rbp
+        pop    %rdi
+        pop    %rsi
+        pop    %rdx
+        pop    %rcx
+        pop    %rbx
+        pop    %rax
+        ret
+
+/* Finished in user mode so return */
+ENTRY(hvm_deprivileged_finish_user_mode_asm)
+        /* The source is the copied stack in our buffer.
+         * The destination is our current stack.
+         *
+         * We need to:
+         * - Move the stack pointer to where it was before we entered
+         *   deprivileged mode.
+         * - Setup rsi, rdi and rcx as needed to perform the copy
+         * - Jump to the address held at the top of the stack which
+         *   is the user mode return address
+         */
+        cli
+        GET_CURRENT(%rbx)
+        movq   VCPU_stack(%rbx), %rsi
+        movq   VCPU_rsp(%rbx), %rdi
+
+        /* The return address that the near call pushed onto the
+         * buffer is pointed to by stack, so use that for rip.
+         */
+        movq   %rdi, %rsp
+
+        /* The following is equivalent to
+         * (get_cpu_info() + sizeof(struct cpu_info) - vcpu->rsp)
+         * This works out how many bytes we need to copy:
+         * (top of stack - bottom of stack)
+         */
+        GET_STACK_BASE(%rcx)
+        addq   $STACK_SIZE, %rcx
+        subq   %rdi, %rcx
+
+        /* Go to user mode return code */
+        jmp    *(%rsi)
+
+/* Entry point from the assembly syscall handlers */
+ENTRY(hvm_deprivileged_handle_user_mode)
+
+        /* Handle a user mode hypercall here */
+
+
+        /* We are finished in user mode */
+        call hvm_deprivileged_finish_user_mode
+
+        ret
+
+.section .hvm_deprivileged_enhancement.text,"ax"
+/* HVM deprivileged code */
+ENTRY(hvm_deprivileged_ring3)
+        /* sysret has loaded eip from rcx and rflags from r11.
+         * CS and SS have been loaded from the MSR for ring 3.
+         * We now need to  switch to the user mode stack
+         */
+        /* Setup usermode stack */
+        movabs $HVM_STACK_PTR, %rsp
+
+        /* Perform user mode processing */
+
+        mov $0xf, %rcx
+1: dec  %rcx
+        cmp $0, %rcx
+        jne 1b
+
+        /* Return to ring 0 */
+        syscall
+
+.previous
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c32d863..595b0f2 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -59,6 +59,8 @@
  #include <asm/event.h>
  #include <asm/monitor.h>
  #include <public/arch-x86/cpuid.h>
+#include <xen/hvm/deprivileged.h>
+

  static bool_t __initdata opt_force_ept;
  boolean_param("force-ept", opt_force_ept);
@@ -194,6 +196,10 @@ void vmx_save_host_msrs(void)
          set_bit(VMX_INDEX_MSR_ ## address, &host_msr_state->flags);     \
      } while ( 0 )

+struct vmx_msr_state *get_host_msr_state(void) {
+    return &this_cpu(host_msr_state);
+}
+
  static enum handler_return
  long_mode_do_msr_read(unsigned int msr, uint64_t *msr_content)
  {
@@ -272,6 +278,7 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
msr_content)
      case MSR_LSTAR:
          if ( !is_canonical_address(msr_content) )
              goto uncanonical_address;
+


Please avoid spurious changes like this.

apologies.

          WRITE_MSR(LSTAR);
          break;

diff --git a/xen/arch/x86/x86_64/asm-offsets.c 
b/xen/arch/x86/x86_64/asm-offsets.c
index 447c650..fd5de44 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -115,6 +115,11 @@ void __dummy__(void)
      OFFSET(VCPU_nsvm_hap_enabled, struct vcpu, 
arch.hvm_vcpu.nvcpu.u.nsvm.ns_hap_enabled);
      BLANK();

+    OFFSET(VCPU_stack, struct vcpu, stack);
+    OFFSET(VCPU_rsp, struct vcpu, rsp);
+    OFFSET(VCPU_user_mode, struct vcpu, user_mode);
+    BLANK();
+
      OFFSET(DOMAIN_is_32bit_pv, struct domain, arch.is_32bit_pv);
      BLANK();

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 74677a2..fa9155c 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -102,6 +102,15 @@ restore_all_xen:
          RESTORE_ALL adj=8
          iretq

+/* Returning from user mode */
+handle_hvm_user_mode:
+
+        call hvm_deprivileged_handle_user_mode
+
+        /* Go back into user mode */
+        cli
+        jmp  restore_all_guest
+
  /*
   * When entering SYSCALL from kernel mode:
   *  %rax                            = hypercall vector
@@ -131,6 +140,14 @@ ENTRY(lstar_enter)
          testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
          jz    switch_to_kernel

+        /* Were we in Xen's ring 3?  */
+        push %rbx
+        GET_CURRENT(%rbx)
+        movq VCPU_user_mode(%rbx), %rbx
+        cmp  $1, %rbx
+        je   handle_hvm_user_mode
+        pop  %rbx


No need for the movq or rbx clobber.  This entire block can be:

cmpb $1, VCPU_user_mode(%rbx)
je handle_hvm_user_mode

Similar to the $TF_kernel_mode check in context above.

got it. Thanks!

+
  /*hypercall:*/
          movq  %r10,%rcx
          cmpq  $NR_hypercalls,%rax
@@ -487,6 +504,13 @@ ENTRY(common_interrupt)
  /* No special register assumptions. */
  ENTRY(ret_from_intr)
          GET_CURRENT(%rbx)
+
+        /* If we are in Xen's user mode, return into it */
+        cmpq $1,VCPU_user_mode(%rbx)
+        cli
+        je    restore_all_guest
+        sti
+


None of this should be necessary - the exception frame created by
lstar_enter should cause ret_from_intr to do the correct thing.

I think this is needed as we have interrupts enabled and so we can takeinterrupts from paths other than lstar_enter. This ensures that Xendoesn't treat our depriv mode as a PV guest which led to random page,general protection etc. faults.

          testb $3,UREGS_cs(%rsp)
          jz    restore_all_xen
          movq  VCPU_domain(%rbx),%rax
@@ -509,6 +533,14 @@ handle_exception_saved:
          GET_CURRENT(%rbx)
          PERFC_INCR(exceptions, %rax, %rbx)
          callq *(%rdx,%rax,8)
+
+        /* If we are in Xen's user mode, return into it */
+        /* TODO: Test this path */
+        cmpq  $1,VCPU_user_mode(%rbx)
+        cli
+        je    restore_all_guest
+        sti
+
          testb $3,UREGS_cs(%rsp)
          jz    restore_all_xen
          leaq  VCPU_trap_bounce(%rbx),%rdx
@@ -664,6 +696,9 @@ handle_ist_exception:
          movl  $EVENT_CHECK_VECTOR,%edi
          call  send_IPI_self
  1:      movq  VCPU_domain(%rbx),%rax
+        /* This also handles Xen ring3 return for us.
+         * So, there is no need to explicitly do a user mode check.
+         */
          cmpb  $0,DOMAIN_is_32bit_pv(%rax)
          je    restore_all_guest
          jmp   compat_restore_all_guest
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h 
b/xen/include/asm-x86/hvm/vmx/vmx.h
index 3fbfa44..98e269e 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -565,4 +565,6 @@ typedef struct {
      u16 eptp_index;
  } ve_info_t;

+struct vmx_msr_state *get_host_msr_state(void);
+
  #endif /* __ASM_X86_HVM_VMX_VMX_H__ */
diff --git a/xen/include/xen/hvm/deprivileged.h 
b/xen/include/xen/hvm/deprivileged.h
index 6cc803e..e42f39a 100644
--- a/xen/include/xen/hvm/deprivileged.h
+++ b/xen/include/xen/hvm/deprivileged.h
@@ -68,6 +68,37 @@ int hvm_deprivileged_copy_l1(struct domain *d,
                               unsigned int l1_flags);


+/* Used to prepare each vcpu's data for user mode. Call for each HVM vcpu. */
+int hvm_deprivileged_prepare_vcpu(struct vcpu *vcpu);
+
+/* Destroy each vcpu's data for Xen user mode. Again, call for each vcpu. */
+void hvm_deprivileged_destroy_vcpu(struct vcpu *vcpu);
+
+/* Called to perform a user mode operation. */
+void hvm_deprivileged_user_mode(void);
+
+/* Called when the user mode operation has completed */
+void hvm_deprivileged_finish_user_mode(void);
+
+/* Called to move into and then out of user mode. Needed for accessing
+ * assembly features.
+ */
+void hvm_deprivileged_user_mode_asm(void);
+
+/* Called on the return path to return to the correct execution point */
+void hvm_deprivileged_finish_user_mode_asm(void);
+
+/* Handle any syscalls that the user mode makes */
+void hvm_deprivileged_handle_user_mode(void);
+
+/* The ring 3 code */
+void hvm_deprivileged_ring3(void);
+
+/* Call when inside a trap that should cause a domain crash if in user mode
+ * e.g. an invalid_op is trapped whilst in user mode.
+ */
+void hvm_deprivileged_check_trap(const char* func_name);
+
  /* The segments where the user mode .text and .data are stored */
  extern unsigned long int __hvm_deprivileged_text_start;
  extern unsigned long int __hvm_deprivileged_text_end;
@@ -91,4 +122,11 @@ extern unsigned long int __hvm_deprivileged_data_end;

  #define HVM_ERR_PG_ALLOC -1

+/* The user mode stack pointer.
++ * The stack grows down so set this to top of the stack region. Then,
++ * as this is 0-indexed, move into the stack, not just after it.
++ * Subtract 16 bytes for correct stack alignment.
++ */
+#define HVM_STACK_PTR (HVM_DEPRIVILEGED_STACK_ADDR + STACK_SIZE - 16)
+
  #endif
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 73d3bc8..180643e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -137,7 +137,7 @@ void evtchn_destroy_final(struct domain *d); /* from 
complete_domain_destroy */

  struct waitqueue_vcpu;

-struct vcpu
+struct vcpu


Trailing whitespace is nasty, but we avoid inflating the patch by
dropping whitespace on lines not touched by semantic changes.

  {
      int              vcpu_id;

@@ -158,6 +158,22 @@ struct vcpu

      void            *sched_priv;    /* scheduler-specific data */

+    /* HVM deprivileged mode state */
+    void *stack;             /* Location of stack to save data onto */
+    unsigned long rsp;       /* rsp of our stack to restore our data to */
+    unsigned long user_mode; /* Are we (possibly moving into) in user mode? */
+
+    /* The mstar of the processor that we are currently executing on.
+     *  we need to save this because Xen does lazy saving of these.
+     */
+    unsigned long int msr_lstar; /* lstar */
+    unsigned long int msr_star;


There should be no need to store this like this.  Follow what the
current context switching code does.

ok, I'll take a look.

~Andrew

+
+    /* Debug info */
+    unsigned long int old_rsp;
+    unsigned long int old_processor;
+    unsigned long int old_msr_lstar;
+    unsigned long int old_msr_star;
      struct vcpu_runstate_info runstate;
  #ifndef CONFIG_COMPAT
  # define runstate_guest(v) ((v)->runstate_guest)


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
  - From: Andrew Cooper
- Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
  - From: David Vrabel

References:
- [Xen-devel] [RFC 0/4] HVM x86 enhancements to run Xen deprivileged mode operations
  - From: Ben Catterall
- [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
  - From: Ben Catterall
- Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
  - From: Andrew Cooper

Prev by Date: Re: [Xen-devel] [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
Next by Date: Re: [Xen-devel] [PATCH v4 31/31] libxl: allow the creation of HVM domains without a device model.
Previous by thread: Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
Next by thread: Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.