[Xen-devel] [PATCH] Fix memory alloction bug after hvm reboot in

To:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-devel] [PATCH] Fix memory alloction bug after hvm reboot in numa system.
From:	"Zhou, Ting G" <ting.g.zhou@xxxxxxxxx>
Date:	Mon, 8 Dec 2008 17:02:10 +0800
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:	"Yang, Xiaowei" <xiaowei.yang@xxxxxxxxx>
Delivery-date:	Mon, 08 Dec 2008 01:03:18 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AclZE6ejxbHmHTXcQbianPKg2S27ng==
Thread-topic:	[PATCH] Fix memory alloction bug after hvm reboot in numa system.

Recently we find a bug on Nahelem machine (totally with two nodes, 6G memory 
(3G in each node): 
- Start a HVM guest with its all VCPUS pinned to node1, so all its memory is 
allocated from node1.
- Reboot the HVM.
- There will be some memory allocated from node0 even there is enough free 
memory on node1.

Reason: For security issues, xen will not put all the pages of a dying hvm to 
domheap directly, but put them in scrub list and wait for handled by 
page_scrub_softirq(). If the dying hvm have a lot of memory, 
page_scrub_softirq() will not handle all of them before the start the hvm. 
There are some pages belong to node1 still in scrub list, new hvm can't use 
pages in it. So this hvm will get different memory distribution than before. 
Before changeset 18304, page_scrub_softirq() can be excuted parallel between 
all the cpus. Changeset 18305 serialise page_scrub_softirq() and Changeset 
18307 serialise page_scrub_softirq() with a new lock to avoid holding up 
acquiring page_scrub_lock in free_domheap_pages(). Those changeset slow the 
ability to handle pages in scrub list. So the bug becomes more obvious after.

Patch: This patch modifiers balloon.free to avoid this bug. After patch, 
balloon.free will check whether current machine is a numa system and the new 
created hvm has all its vcpus in the same node. If all the conditions above 
fit, we will wait until all the pages in scrub list are freed (if waiting time 
go beyond 20s, we will stop waiting it.). 

This seems to be too restricted at the first glance. We used to only wait for 
the free memory size of pinned node is bigger than required. But as we know HVM 
memory alloction granularity is 2M. Even the former condition is satisfied, we 
still may not find enough 2M-size memory on that node.

Signed-off-by: Ting Zhou <ting.g.zhou@xxxxxxxxx>
Signed-off-by: Xiaowei Yang <Xiaowei.yang@xxxxxxxxx>

numa_hvm_reboot.patch
Description: numa_hvm_reboot.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] [PATCH] Fix memory alloction bug after hvm reboot in numa sy