WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin

To: Chris Lalancette <clalance@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH]: Fix deadlock in mm_pin
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Thu, 20 Nov 2008 14:46:58 +0000
Cc:
Delivery-date: Thu, 20 Nov 2008 06:47:24 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <49253C9A.5020406@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AclLHta0FYHsnLcSEd2/hgAX8io7RQ==
Thread-topic: [Xen-devel] [PATCH]: Fix deadlock in mm_pin
User-agent: Microsoft-Entourage/11.4.0.080122
On 20/11/08 10:31, "Chris Lalancette" <clalance@xxxxxxxxxx> wrote:

> it applies to the 2.6.18 tree as well; the deadlock scenario is below.
> 
> "After running an arbitrary workload involving network traffic for some time
> (1-2 days), a xen guest running the 2.6.9-67 x86_64 xenU kernel locks up with
> both vcpu's spinning at 100%.
> 
> The problem is due to a race between the scheduler and network interrupts.  On
> one vcpu, the scheduler takes the runqueue spinlock of the other vcpu to
> schedule a process, and attempts to lock mm_unpinned_lock.  On the other vcpu,
> another process is holding mm_unpinned_lock (because it is starting or
> exiting), and is interrupted by a network interrupt.  The network interrupt
> handler attempts to wake up the same process that the first vcpu is trying to
> schedule, and will try to get the runqueue spinlock that the first vcpu is
> already holding."

I don't believe that mm_unpinned_lock can ever be taken while a runqueue
lock is already held in 2.6.18. If you can provide a call chain then I'll
consider the patch -- but I think you'd still be screwed by the
mm->page_table_lock (also acquired in mm_pin() code, also not IRQ safe, but
less easy for you to go convert all the users of that lock).

You might have some backporting from 2.6.18 to do...

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>