WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: CPU offlining patch xen-unstable:21049

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Subject: [Xen-devel] Re: CPU offlining patch xen-unstable:21049
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Thu, 15 Apr 2010 12:04:01 +0100
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 15 Apr 2010 04:05:18 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <789F9655DD1B8F43B48D77C5D30659731D73CED7@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcrcdgaxyxAXcjWt2U2B4gzh49/iiwAAddGwAAM8tEEAAA/mEAABkodd
Thread-topic: CPU offlining patch xen-unstable:21049
User-agent: Microsoft-Entourage/12.24.0.100205
I decided to keep the spin_trylock as they quell my paranoia about other
possible deadlock scenarios inside those complicated hypercall functions.
But I have modified the comments appropriately, in xen-unstable:21179. Note
that this also depends on xen-unstable:21178 (we mustn't execute the
hypercall continuation immediately, in the context of the caller of
c_h_o_c()). Thanks.

But, here's a more subtle and more tricky deadlock scenario for you. You'll
like this one :-): stop_machine_run() schedules a softirq on every CPU.
Let's say CPU A enters our softirq handler, interrupting some guest VCPU X
which is still scheduled on CPU A. But some other CPU B could be waiting for
X to be descheduled (one obvious example is hvmop_flush_tlb_all, which is a
good one because some HVM guest can call that at any time). So we never get
full softirq rendezvous because CPU B is spinning in hvmop_flush_tlb_all(),
while CPU A spins in the stop_machine softirq handler. Deadlock!

What do you think of that? :-D

 -- Keir

On 15/04/2010 11:19, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:

> Aha, yes, you are right. So do I need create a patch, or you can simply revert
> some chunks?
> 
> --jyh
> 
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
>> Sent: Thursday, April 15, 2010 6:17 PM
>> To: Jiang, Yunhong
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: Re: CPU offlining patch xen-unstable:21049
>> 
>> On 15/04/2010 09:50, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>> 
>>> I think the try_lock is not for the cpu_down(). The point is, if another CPU
>>> is trying the get the lock.
>>> 
>>> Considering following scnerio:
>>> 1) cpu_down() in CPU A, and get the xenpf_lock, then call to
>>> stop_machine_run(), trying to bring all CPU to stop_machine_run context.
>>> 2) At the same time, another vcpu in CPU B do a xenpf hypercall, and try to
>>> get the xenpf_lock. If ther is no retyr for this lock, it can't get
>>> xenpf_lock, it will never go to the softirq
>>> So the system will hang.
>>> 
>>> Hope this make thing clear.
>> 
>> But CPU A doesn't hold the xenpf_lock when it calls stop_machine_run(). It
>> dropped it before cpu_down() got invoked, because that gets executed via
>> continue_hypercall_on_cpu().
>> 
>> -- Keir
>> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>