WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] OOM problems

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] OOM problems
From: John Weekes <lists.xen@xxxxxxxxxxxxxxxxxx>
Date: Fri, 12 Nov 2010 23:57:22 -0800
Delivery-date: Fri, 12 Nov 2010 23:58:46 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6
On machines running many HVM (stubdom-based) domains, I often see errors like this:

[77176.524094] qemu-dm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[77176.524102] Pid: 7478, comm: qemu-dm Not tainted 2.6.32.25-g80f7e08 #2
[77176.524109] Call Trace:
[77176.524123]  [<ffffffff810897fd>] ? T.413+0xcd/0x290
[77176.524129]  [<ffffffff81089ad3>] ? __out_of_memory+0x113/0x180
[77176.524133]  [<ffffffff81089b9e>] ? out_of_memory+0x5e/0xc0
[77176.524140]  [<ffffffff8108d1cb>] ? __alloc_pages_nodemask+0x69b/0x6b0
[77176.524144]  [<ffffffff8108d1f2>] ? __get_free_pages+0x12/0x60
[77176.524152]  [<ffffffff810c94e7>] ? __pollwait+0xb7/0x110
[77176.524161]  [<ffffffff81262b93>] ? n_tty_poll+0x183/0x1d0
[77176.524165]  [<ffffffff8125ea42>] ? tty_poll+0x92/0xa0
[77176.524169]  [<ffffffff810c8a92>] ? do_select+0x362/0x670
[77176.524173]  [<ffffffff810c9430>] ? __pollwait+0x0/0x110
[77176.524178]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524183]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524188]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524193]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524197]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524202]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524207]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524212]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524217]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524222]  [<ffffffff810c8fb5>] ? core_sys_select+0x215/0x350
[77176.524231]  [<ffffffff810100af>] ? xen_restore_fl_direct_end+0x0/0x1
[77176.524236]  [<ffffffff8100c48d>] ? xen_mc_flush+0x8d/0x1b0
[77176.524243]  [<ffffffff81014ffb>] ? xen_hypervisor_callback+0x1b/0x20
[77176.524251]  [<ffffffff814b0f5a>] ? error_exit+0x2a/0x60
[77176.524255]  [<ffffffff8101485d>] ? retint_restore_args+0x5/0x6
[77176.524263]  [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0
[77176.524268]  [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0
[77176.524276]  [<ffffffff810663d1>] ? ktime_get_ts+0x61/0xd0
[77176.524281]  [<ffffffff810c9354>] ? sys_select+0x44/0x120
[77176.524286]  [<ffffffff81013f02>] ? system_call_fastpath+0x16/0x1b
[77176.524290] Mem-Info:
[77176.524293] DMA per-cpu:
[77176.524296] CPU    0: hi:    0, btch:   1 usd:   0
[77176.524300] CPU    1: hi:    0, btch:   1 usd:   0
[77176.524303] CPU    2: hi:    0, btch:   1 usd:   0
[77176.524306] CPU    3: hi:    0, btch:   1 usd:   0
[77176.524310] CPU    4: hi:    0, btch:   1 usd:   0
[77176.524313] CPU    5: hi:    0, btch:   1 usd:   0
[77176.524316] CPU    6: hi:    0, btch:   1 usd:   0
[77176.524318] CPU    7: hi:    0, btch:   1 usd:   0
[77176.524322] CPU    8: hi:    0, btch:   1 usd:   0
[77176.524324] CPU    9: hi:    0, btch:   1 usd:   0
[77176.524327] CPU   10: hi:    0, btch:   1 usd:   0
[77176.524330] CPU   11: hi:    0, btch:   1 usd:   0
[77176.524333] CPU   12: hi:    0, btch:   1 usd:   0
[77176.524336] CPU   13: hi:    0, btch:   1 usd:   0
[77176.524339] CPU   14: hi:    0, btch:   1 usd:   0
[77176.524342] CPU   15: hi:    0, btch:   1 usd:   0
[77176.524345] CPU   16: hi:    0, btch:   1 usd:   0
[77176.524348] CPU   17: hi:    0, btch:   1 usd:   0
[77176.524351] CPU   18: hi:    0, btch:   1 usd:   0
[77176.524354] CPU   19: hi:    0, btch:   1 usd:   0
[77176.524358] CPU   20: hi:    0, btch:   1 usd:   0
[77176.524364] CPU   21: hi:    0, btch:   1 usd:   0
[77176.524367] CPU   22: hi:    0, btch:   1 usd:   0
[77176.524370] CPU   23: hi:    0, btch:   1 usd:   0
[77176.524372] DMA32 per-cpu:
[77176.524374] CPU    0: hi:  186, btch:  31 usd:  81
[77176.524377] CPU    1: hi:  186, btch:  31 usd:  66
[77176.524380] CPU    2: hi:  186, btch:  31 usd:  49
[77176.524385] CPU    3: hi:  186, btch:  31 usd:  67
[77176.524387] CPU    4: hi:  186, btch:  31 usd:  93
[77176.524390] CPU    5: hi:  186, btch:  31 usd:  73
[77176.524393] CPU    6: hi:  186, btch:  31 usd:  50
[77176.524396] CPU    7: hi:  186, btch:  31 usd:  79
[77176.524399] CPU    8: hi:  186, btch:  31 usd:  21
[77176.524402] CPU    9: hi:  186, btch:  31 usd:  38
[77176.524406] CPU   10: hi:  186, btch:  31 usd:   0
[77176.524409] CPU   11: hi:  186, btch:  31 usd:  75
[77176.524412] CPU   12: hi:  186, btch:  31 usd:   1
[77176.524414] CPU   13: hi:  186, btch:  31 usd:   4
[77176.524417] CPU   14: hi:  186, btch:  31 usd:   9
[77176.524420] CPU   15: hi:  186, btch:  31 usd:   0
[77176.524423] CPU   16: hi:  186, btch:  31 usd:  56
[77176.524426] CPU   17: hi:  186, btch:  31 usd:  35
[77176.524429] CPU   18: hi:  186, btch:  31 usd:  32
[77176.524432] CPU   19: hi:  186, btch:  31 usd:  39
[77176.524435] CPU   20: hi:  186, btch:  31 usd:  24
[77176.524438] CPU   21: hi:  186, btch:  31 usd:   0
[77176.524441] CPU   22: hi:  186, btch:  31 usd:  35
[77176.524444] CPU   23: hi:  186, btch:  31 usd:  51
[77176.524447] Normal per-cpu:
[77176.524449] CPU    0: hi:  186, btch:  31 usd:  29
[77176.524453] CPU    1: hi:  186, btch:  31 usd:   1
[77176.524456] CPU    2: hi:  186, btch:  31 usd:  30
[77176.524459] CPU    3: hi:  186, btch:  31 usd:  30
[77176.524463] CPU    4: hi:  186, btch:  31 usd:  30
[77176.524466] CPU    5: hi:  186, btch:  31 usd:  31
[77176.524469] CPU    6: hi:  186, btch:  31 usd:   0
[77176.524471] CPU    7: hi:  186, btch:  31 usd:   0
[77176.524474] CPU    8: hi:  186, btch:  31 usd:  30
[77176.524477] CPU    9: hi:  186, btch:  31 usd:  28
[77176.524480] CPU   10: hi:  186, btch:  31 usd:   0
[77176.524483] CPU   11: hi:  186, btch:  31 usd:  30
[77176.524486] CPU   12: hi:  186, btch:  31 usd:   0
[77176.524489] CPU   13: hi:  186, btch:  31 usd:   0
[77176.524492] CPU   14: hi:  186, btch:  31 usd:   0
[77176.524495] CPU   15: hi:  186, btch:  31 usd:   0
[77176.524498] CPU   16: hi:  186, btch:  31 usd:   0
[77176.524501] CPU   17: hi:  186, btch:  31 usd:   0
[77176.524504] CPU   18: hi:  186, btch:  31 usd:   0
[77176.524507] CPU   19: hi:  186, btch:  31 usd:   0
[77176.524510] CPU   20: hi:  186, btch:  31 usd:   0
[77176.524513] CPU   21: hi:  186, btch:  31 usd:   0
[77176.524516] CPU   22: hi:  186, btch:  31 usd:   0
[77176.524518] CPU   23: hi:  186, btch:  31 usd:   0
[77176.524524] active_anon:5675 inactive_anon:4676 isolated_anon:0
[77176.524526]  active_file:146373 inactive_file:153543 isolated_file:480
[77176.524527]  unevictable:0 dirty:167539 writeback:322 unstable:0
[77176.524528]  free:5017 slab_reclaimable:15640 slab_unreclaimable:8972
[77176.524529]  mapped:1114 shmem:7 pagetables:1908 bounce:0
[77176.524536] DMA free:9820kB min:32kB low:40kB high:48kB active_anon:4kB inactive_anon:0kB active_file:616kB inactive_file:2212kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12740kB mlocked:0kB dirty:2292kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:72kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:12kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3040 all_unreclaimable? no
[77176.524541] lowmem_reserve[]: 0 1428 2452 2452
[77176.524551] DMA32 free:7768kB min:3680kB low:4600kB high:5520kB active_anon:22696kB inactive_anon:18704kB active_file:584580kB inactive_file:608508kB unevictable:0kB isolated(anon):0kB isolated(file):1920kB present:1462496kB mlocked:0kB dirty:664128kB writeback:1276kB mapped:4456kB shmem:28kB slab_reclaimable:62076kB slab_unreclaimable:32292kB kernel_stack:5120kB pagetables:7620kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1971808 all_unreclaimable? yes
[77176.524556] lowmem_reserve[]: 0 0 1024 1024
[77176.524564] Normal free:2480kB min:2636kB low:3292kB high:3952kB active_anon:0kB inactive_anon:0kB active_file:296kB inactive_file:3452kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1048700kB mlocked:0kB dirty:3736kB writeback:12kB mapped:0kB shmem:0kB slab_reclaimable:412kB slab_unreclaimable:3488kB kernel_stack:80kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:8192 all_unreclaimable? yes
[77176.524569] lowmem_reserve[]: 0 0 0 0
[77176.524574] DMA: 4*4kB 25*8kB 11*16kB 7*32kB 8*64kB 8*128kB 8*256kB 3*512kB 0*1024kB 0*2048kB 1*4096kB = 9832kB [77176.524587] DMA32: 742*4kB 118*8kB 3*16kB 3*32kB 2*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 7768kB [77176.524600] Normal: 1*4kB 1*8kB 2*16kB 13*32kB 14*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1612kB
[77176.524613] 302308 total pagecache pages
[77176.524615] 1619 pages in swap cache
[77176.524617] Swap cache stats: add 40686, delete 39067, find 24687/26036
[77176.524619] Free swap  = 10141956kB
[77176.524621] Total swap = 10239992kB
[77176.577607] 793456 pages RAM
[77176.577611] 436254 pages reserved
[77176.577613] 308627 pages shared
[77176.577615] 49249 pages non-shared
[77176.577620] Out of memory: kill process 5755 (python2.6) score 110492 or a child
[77176.577623] Killed process 5757 (python2.6)

Depending on what gets nuked by the OOM-killer, I am frequently left with an unusable system that needs to be rebooted.

The machine always has plenty of memory available (1.5 GB devoted to dom0, of which >1 GB is always just in "cached" state). For instance, right now, on this same machine:

# free
             total       used       free     shared    buffers     cached
Mem:       1536512    1493112      43400          0      10284    1144904
-/+ buffers/cache:     337924    1198588
Swap:     10239992      74444   10165548

I have seen this OOM problem on a wide range of Xen versions, stretching as far back as I can remember, including the most recent 4.1-unstable and 2.6.32 pvops kernel (from yesterday, tested in the hope that they would fix this). I haven't found a way to reliably reproduce it yet, but I suspect that the problem relates to reasonably heavy disk or network activity -- during this last one, I see that a domain was briefly doing ~200 Mbps of downloads.

Anyone have any ideas on what this could be? Is RAM getting spontaneously filled because a buffer somewhere grows too quickly, or something like that? What can I try here?

-John

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>