WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Running out of Xen heap space with large memory

To: Xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Running out of Xen heap space with large memory
From: beth kon <eak@xxxxxxxxxx>
Date: Tue, 20 Nov 2007 14:00:57 -0500
Delivery-date: Tue, 20 Nov 2007 11:02:22 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: IBM
Reply-to: eak@xxxxxxxxxx
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla Thunderbird 1.0.8-1.1.fc4 (X11/20060501)
Hi. I have been debugging a hang when booting Xen with 512G memory. I am testing on an x3950, 8 nodes, 128 way, with xen 3.0.3. In order to get past the 166G limit, IETH added the no-pv-compat flag to the xen boot line.

The boots have been hanging when memory is greater than about 432G. I found that, as memory size increases, the size of the xen heap decreases, because additional memory is being used for the memory housekeeping (seen in init_boot_allocator). Eventually (around 432G) the xen heap is reported as 0MB:

(XEN) Command line: /xen.gz-2.6.18-53.el5 numa=on dom0_mem=512m com2=19200,8n1
console=com2 no-pv-compat
(XEN)  0000000000000000 - 0000000000098000 (usable)
(XEN)  0000000000098c00 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000007fea6000 (usable)
(XEN)  000000007fea64c0 - 000000007fef6380 (ACPI data)
(XEN)  000000007fef6380 - 0000000090000000 (reserved)
(XEN)  00000000fec00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000007e00000000 (usable)
(XEN) System RAM: 514046MB (526383352kB)
(XEN) ACPI: [SRAT:0x00] ignored 32 entries of 64 found
(XEN) BETH xenheap_phys_start = 18677760, xenheap_phys_end=16777216
(XEN) BETH s = 4294967296, e=16777216
(XEN) Xen heap: 0MB (0kB)
(XEN) Cannot handle page request order 2!
(XEN) Cannot handle page request order 0!
(XEN) Unknown interrupt

The boot hangs at this point (no surprise). Note that the "BETH" debug statements show xenheap_phys_start>xenheap_phys_end.

I realize that interest in this may be low since the no-pv-compat flag has been removed in xen-unstable. The resulting 166G limit was discussed here:
http://lists.xensource.com/archives/html/xen-devel/2007-08/msg00493.html

If pages being passed among domains is the issue, wouldn't the no-pv-compat flag address that by not allowing any 32 bit guests on the machine? I assume page stealing is restricted to domains running on the same hypervisor, right? Why was the no-pv-compat flag removed?

And as Raj discussed:
http://lists.xensource.com/archives/html/xen-devel/2007-10/msg00550.html
Even if Raj provides these changes, I believe the above issue with running out of heap space would still exist. I see that init_boot_allocator is unchanged between 3.0.3 and unstable. Any suggestions on how this issue could be corrected?

I will continue to look at this code, but any suggestions from people who are more familiar with it would be greatly appreciated.
--

Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: eak@xxxxxxxxxx


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>