|  |  | 
  
    |  |  | 
 
  |   |  | 
  
    |  |  | 
  
    |  |  | 
  
    |   xen-devel
[Xen-devel] Re: [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU	restor 
| Checking whether there is a non-lazy state to save is architectural 
specific and very messy. For instance, we need to read LWP_CBADDR to 
confirm LWP's dirty state. This MSR is AMD specific and we don't want to 
add it here. Plus reading data from LWP_CBADDR MSR might be as expensive 
as clts/stts. 
My previous email showed that the overhead with LWP is around 1%-2% of 
__context_switch(). For non lwp-capable CPU, this overhead should be 
much smaller (only clts and stts) because xfeature_mask[LWP] is 0. 
Yes, clts() and stts() don't have to called every time. How about this one?
/* Restore FPU state whenever VCPU is schduled in. */
void vcpu_restore_fpu_eager(struct vcpu *v)
{
    ASSERT(!is_idle_vcpu(v));
    /* save the nonlazy extended state which is not tracked by CR0.TS bit */
    if ( xsave_enabled(v) )
    {
        /* Avoid recursion */
        clts();
        fpu_xrstor(v, XSTATE_NONLAZY);
        stts();
    }
.
On 05/04/2011 02:09 AM, Jan Beulich wrote:
On 03.05.11 at 22:17, Wei Huang<wei.huang2@xxxxxxx>  wrote:
 
Again as pointed out earlier, ...
 
--- a/xen/arch/x86/domain.c     Tue May 03 13:49:27 2011 -0500
+++ b/xen/arch/x86/domain.c     Tue May 03 13:59:37 2011 -0500
@@ -1578,6 +1578,7 @@
         memcpy(stack_regs,&n->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
         if ( xsave_enabled(n)&&  n->arch.xcr0 != get_xcr0() )
             set_xcr0(n->arch.xcr0);
+        vcpu_restore_fpu_eager(n);
 
... this call is unconditional, ...
 
         n->arch.ctxt_switch_to(n);
     }
--- a/xen/arch/x86/i387.c       Tue May 03 13:49:27 2011 -0500
+++ b/xen/arch/x86/i387.c       Tue May 03 13:59:37 2011 -0500
@@ -160,10 +160,25 @@
/*******************************/
/*       VCPU FPU Functions    */
/*******************************/
+/* Restore FPU state whenever VCPU is schduled in. */
+void vcpu_restore_fpu_eager(struct vcpu *v)
+{
+    ASSERT(!is_idle_vcpu(v));
+
+    /* Avoid recursion */
+    clts();
+
+    /* save the nonlazy extended state which is not tracked by CR0.TS bit */
+    if ( xsave_enabled(v) )
+        fpu_xrstor(v, XSTATE_NONLAZY);
+
+    stts();
 
... while here you do an unconditional clts followed by an xrstor only
checking whether xsave is enabled (but not checking whether there's
any non-lazy state to be restored) and, possibly the most expensive
of all, an unconditional write of CR0.
Jan
 
+}
+
/*
  * Restore FPU state when #NM is triggered.
  */
-void vcpu_restore_fpu(struct vcpu *v)
+void vcpu_restore_fpu_lazy(struct vcpu *v)
{
     ASSERT(!is_idle_vcpu(v));
 
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 | 
 |  | 
  
    |  |  |