WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [PATCH] [RFC] Fix a small window on CPU online/offline

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: [Xen-devel] [PATCH] [RFC] Fix a small window on CPU online/offline
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Thu, 1 Apr 2010 17:22:59 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date: Thu, 01 Apr 2010 02:24:14 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcrRfOuRFKcGrHN4RFyYXcp7bcYm2w==
Thread-topic: [PATCH] [RFC] Fix a small window on CPU online/offline
This is a RFC patch for a small window on CPU online/offline. It is not a clean 
solution, I try to send it out before I finish checking all the related code, 
so that I can get feedback from community.
Currently there is a small window on CPU online/offline. During take_cpu_down() 
in stop_machine_run() context, the CPU is marked offline and irq is disabled. 
But it is only at play_dead(), in idle_task_exit() from cpu_exit_clear(), the 
offlining CPU try to sync the lazy exec states. The window is, when 
play_dead(), the stop_machine_run() is done already, and the vcpu whose context 
is out-of-sync may be scheduled on another CPU.

This may cause several issues:
a) When the vcpu is scheduled on another CPU, it will try to sync the context 
on the original CPU, through flush_tlb_mask, as following code in 
context_switch(). Because the original CPU is marked as offline and irq 
disabled, it will hang in flush_area_mask. I try to send patch 
21079:8ab60a883fd5 to avoid the hang.
    if ( unlikely(!cpu_isset(cpu, dirty_mask) && !cpus_empty(dirty_mask)) )
    {
        /* Other cpus call __sync_lazy_execstate from flush ipi handler. */
        flush_tlb_mask(&dirty_mask);
    }

b) However, changeset 21079 is not the right solution still, although the patch 
itself is ok. With this changeset, system will not hang. But the vCPU's context 
is not synced.
c) More is, when the offlining CPU execute the idle_task_exit(), it may try to 
re-sync the vcpu context with the guest, this will clobber the running vCPU.

The following code try to sync the vcpu context in stop_machine_run() context, 
so that the vCPU will get the the context synced. However, it still not resolve 
issue c. I'm considering to mark the curr_vcpu() to be idle also, so that 
idle_task_exit() will not try to sync context again, but I suspect that is not 
a right way.

Any suggestion?

BTW, the flush_local is to make sure we flush all TLB context, so that when CPU 
online again, there is no garbage on the CPU, especially if the CPU has no deep 
C state.
--jyh

diff -r ebd84be3420a xen/arch/x86/smpboot.c
--- a/xen/arch/x86/smpboot.c    Tue Mar 30 18:31:39 2010 +0100
+++ b/xen/arch/x86/smpboot.c    Thu Apr 01 16:47:57 2010 +0800
@@ -34,6 +34,7 @@
 *      Rusty Russell   :   Hacked into shape for new "hotplug" boot process. */

 #include <xen/config.h>
+#include <asm/i387.h>
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/mm.h>
@@ -1308,6 +1309,22 @@ int __cpu_disable(void)

    cpu_disable_scheduler();

+
+    if ( !is_idle_vcpu(this_cpu(curr_vcpu)) )
+    {
+        struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
+        struct vcpu *v;
+
+        v = this_cpu(curr_vcpu);
+        memcpy(&v->arch.guest_context.user_regs,
+          stack_regs,
+          CTXT_SWITCH_STACK_BYTES);
+        unlazy_fpu(v);
+        current->arch.ctxt_switch_from(v);
+    }
+
+    flush_local(FLUSH_CACHE | FLUSH_TLB_GLOBAL |FLUSH_TLB);
+
    return 0;
 }

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel