WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen

To: "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>, "Haitao Shan" <maillists.shan@xxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Subject: RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
From: "Shan, Haitao" <haitao.shan@xxxxxxxxx>
Date: Fri, 12 Sep 2008 00:00:55 +0800
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 11 Sep 2008 09:03:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C4EEE682.2707B%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <823A93EED437D048963A3697DB0E35DE01C1EB0A@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <C4EEE682.2707B%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AckTRQdBPwPE1uAaTHujwHsOZlG02QAgVtYgAA47ZqAAAGuaQAAF9BQsAAMUy1A=
Thread-topic: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
Hi, Keir,

Concerning the last running vcpu on the dying cpu, I have some thought.
Yes, there would be a short time after the stop_machine_run when this vcpu 
v->processor == dying_cpu. But anyhow, we set fie __VPF_migrating flag for that 
vcpu and issued a schedule_softirq on the dying cpu. 
This softirq should run immediately after stop_machine context, am I right? If 
so, by the time the schedule softirq is executed, this last vcpu is migrated 
away from this dying cpu. But saving of its context will be delayed to 
play_dead->sync_lazy_context.
If another cpu issues the schedule request to this dying cpu 
(vcpu_sleep_nosync->cpu_raise_softirq(vc->processor....)) during this time, the 
request will be serviced by the above code sequence. So it is safe in such 
cases.
Am I missing something important? I am not quite confident on the statements, 
though.

Shan Haitao

-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] 
Sent: 2008年9月11日 22:15
To: Shan, Haitao; Haitao Shan; Tian, Kevin
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen

I applied the patch with the following changes:
 * I rewrote your changes to fixup_irqs(). We should force lazy EOIs *after*
we have serviced any straggling interrupts. Also we should actually clear
the EOI stack so it is empty next time the CPU comes online.
 * I simplified your changes to schedule.c in light of the fact we run in
stop_machine context. Hence we can be quite relaxed about locking, for
example.
 * I removed your change to __csched_vcpu_is_migrateable() and instead put a
similar check in csched_load_balance(). I think this is clearer and also
cheaper.

I note that the VCPU currently running on the offlined CPU continues to run
there even after __cpu_disable(), and until that CPU does a final run
through the scheduler soon after. I hope it does not matter there is one
vcpu with v->processor == offlined_cpu for a short while (e.g., what if
another CPU does vcpu_sleep_nosync(v) -> cpu_raise_softirq(v->processor,
...)). I *think* it's actually okay, but I'm not totally certain. Really I
guess this patch needs some stress testing (lots of online/offline cycles
while pausing/unpausing domains, etc). Perhaps we could plumb through a Xen
sysctl and make a small dom0 utility for this purpose?

 -- Keir

On 11/9/08 12:33, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:

> Thanks!
> Concerning cpu online/offline development, I have a small question here.
> Since cpu_online_map is very important, code in different subsystems may use
> it extensively. If such code is not designed with cpu online/offline in mind,
> it may introduce race conditions, just like the one fixed in cpu calibration
> rendezvous.
> Currently, we solve it in a find-and-fix manner. Do you have any idea that can
> solve the problem in a cleaner way?
> Thanks in advance.
> 
> Shan Haitao 
> 
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Sent: 2008年9月11日 19:13
> To: Shan, Haitao; Haitao Shan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
> 
> It looks much better. I'll read through, maybe tweak, and most likely then
> check it in.
> 
>  Thanks,
>  Keir
> 
> On 11/9/08 09:02, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:
> 
>> Hi, Keir,
>> 
>> Attached is the updated patch using the methods as you described in
>> another mail.
>> What do you think of the one?
>> 
>> Signed-off-by: Shan Haitao <haitao.shan@xxxxxxxxx>
>> 
>> Best Regards
>> Haitao Shan
>> 
>> Haitao Shan wrote:
>>> Agree. Placing migration in stop_machine context will definitely make
>>> our jobs easier. I will start making a new patch tomorrow. :)
>>> I place the migraton code outside the stop_machine_run context, partly
>>> because I am not quite sure how long it will take to migrate all the
>>> vcpus away. If it takes too much time, all useful works are blocked
>>> since all cpus are in the stop_machine context. Of course, I borrowed
>>> the ideas from kernel, which also let me made the desicion.
>>> 
>>> 2008/9/10 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
>>>> I feel this is more complicated than it needs to be.
>>>> 
>>>> How about clearing VCPUs from the offlined CPU's runqueue from the
>>>> very end of __cpu_disable()? At that point all other CPUs are safely
>>>> in softirq context with IRQs disabled, and we are running on the
>>>> correct CPU (being offlined). We could have a hook into the
>>>> scheduler subsystem at that point to break affinities, assign to
>>>> different runqueues, etc. We would just need to be careful not to
>>>> try an IPI. :-) This approach would not need a cpu_schedule_map
>>>> (which is really increasing code fragility imo, by creating possible
>>>> extra confusion about which cpumask is the wright one to use in a
>>>> given situation).
>>>> 
>>>> My feeling, unless I've missed something, is that this would make
>>>> the patch quite a bit smaller and with a smaller spread of code
>>>> changes. 
>>>> 
>>>>  -- Keir
>>>> 
>>>> On 9/9/08 09:59, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote:
>>>> 
>>>>> This patch implements cpu offline feature.
>>>>> 
>>>>> Best Regards
>>>>> Haitao Shan
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel