[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH] mem_sharing: fix race condition of nominate and unshare


  • To: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
  • From: Jui-Hao Chiang <juihaochiang@xxxxxxxxx>
  • Date: Fri, 7 Jan 2011 11:54:27 +0800
  • Cc: tinnycloud <tinnycloud@xxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 06 Jan 2011 19:55:20 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=QmFuQs2eR7TAKTb5A5IiREk3lDH09M86+ySOfdH9680p1Ii8lK+HJq9RvyuOaV/OBo 46B21aXfw+FwT08Tq9WtlWXCh5bTt5ooQ1fscP1matYsLLLy4dh01NCosS72ZsSZiVyL FxNbQTIStr2qp0QzIkgVYYCm8lbFKDZ7Neec4=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi, Tim

On Fri, Jan 7, 2011 at 12:54 AM, Tim Deegan <Tim.Deegan@xxxxxxxxxx> wrote:
At 16:11 +0000 on 06 Jan (1294330319), Jui-Hao Chiang wrote:
> Hi, this patch does the following
> (1) When updating/checking p2m type for mem_sharing, we must hold shr_lock
> (2) For nominate operation, if the page is already nominated, return the handle from page_info->shr_handle
> (3) For unshare operation, it is possible that multiple users unshare a page via hvm_hap_nested_page_fault() at the same time. If the page is already un-shared by someone else, simply return success.

I'm going to apply this, since it looks like an improvement, but I'm not
convinced it properly solves the problem.


It seems tinnycloud's case is when dom0 try to RW maps a shared page, which should unshare it properly, and change the type count.
But there is still a bug hidden in page_make_sharable() which fails to recover type count when the call fails.
Now I trace it again, and found something different than what we have discussed before.
I will find it and submit patch again.

 
> NOTE: we assume that nobody holds page_alloc_lock/p2m_lock before calling nominate/share/unshare.

As I told you earlier, that's not the case.  p2m_teardown() can call
mem_sharing_unshare_page() with the p2m lock held.


Oops, I forgot again.
After this change, unshare() has a potential problem of deadlock for shr_lock and p2m_lock with different locking order.
Assume two CPUs do the following
CPU1: hvm_hap_nested_page_fault() => unshare() => p2m_change_type() (locking order: shr_lock, p2m_lock)
CPU2: p2m_teardown() => unshare() (locking order: p2m_lock, shr_lock)
When CPU1 grabs shr_lock and CPU2 grabs p2m_lock, they deadlock later.

So it seems better to fix the following rules
(1) Fix locking order: p2m_lock ---> shr_lock
(2) Any function in mem_sharing, if modifying/checking p2m entry is necessary, it must hold p2m_lock and then shr_lock. Later on, when changing p2m entries, don't call any p2m function which locks p2m again

So for p2m functions, it seems better to provide some functions which don't call p2m_lock again.
What do you think? If that's ok, I will do it in this way.

Bests,
Jui-Hao
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.