On machines running many HVM (stubdom-based) domains, I often see errors
like this:
[77176.524094] qemu-dm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[77176.524102] Pid: 7478, comm: qemu-dm Not tainted 2.6.32.25-g80f7e08 #2
[77176.524109] Call Trace:
[77176.524123] [<ffffffff810897fd>] ? T.413+0xcd/0x290
[77176.524129] [<ffffffff81089ad3>] ? __out_of_memory+0x113/0x180
[77176.524133] [<ffffffff81089b9e>] ? out_of_memory+0x5e/0xc0
[77176.524140] [<ffffffff8108d1cb>] ? __alloc_pages_nodemask+0x69b/0x6b0
[77176.524144] [<ffffffff8108d1f2>] ? __get_free_pages+0x12/0x60
[77176.524152] [<ffffffff810c94e7>] ? __pollwait+0xb7/0x110
[77176.524161] [<ffffffff81262b93>] ? n_tty_poll+0x183/0x1d0
[77176.524165] [<ffffffff8125ea42>] ? tty_poll+0x92/0xa0
[77176.524169] [<ffffffff810c8a92>] ? do_select+0x362/0x670
[77176.524173] [<ffffffff810c9430>] ? __pollwait+0x0/0x110
[77176.524178] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524183] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524188] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524193] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524197] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524202] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524207] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524212] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524217] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524222] [<ffffffff810c8fb5>] ? core_sys_select+0x215/0x350
[77176.524231] [<ffffffff810100af>] ? xen_restore_fl_direct_end+0x0/0x1
[77176.524236] [<ffffffff8100c48d>] ? xen_mc_flush+0x8d/0x1b0
[77176.524243] [<ffffffff81014ffb>] ? xen_hypervisor_callback+0x1b/0x20
[77176.524251] [<ffffffff814b0f5a>] ? error_exit+0x2a/0x60
[77176.524255] [<ffffffff8101485d>] ? retint_restore_args+0x5/0x6
[77176.524263] [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0
[77176.524268] [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0
[77176.524276] [<ffffffff810663d1>] ? ktime_get_ts+0x61/0xd0
[77176.524281] [<ffffffff810c9354>] ? sys_select+0x44/0x120
[77176.524286] [<ffffffff81013f02>] ? system_call_fastpath+0x16/0x1b
[77176.524290] Mem-Info:
[77176.524293] DMA per-cpu:
[77176.524296] CPU 0: hi: 0, btch: 1 usd: 0
[77176.524300] CPU 1: hi: 0, btch: 1 usd: 0
[77176.524303] CPU 2: hi: 0, btch: 1 usd: 0
[77176.524306] CPU 3: hi: 0, btch: 1 usd: 0
[77176.524310] CPU 4: hi: 0, btch: 1 usd: 0
[77176.524313] CPU 5: hi: 0, btch: 1 usd: 0
[77176.524316] CPU 6: hi: 0, btch: 1 usd: 0
[77176.524318] CPU 7: hi: 0, btch: 1 usd: 0
[77176.524322] CPU 8: hi: 0, btch: 1 usd: 0
[77176.524324] CPU 9: hi: 0, btch: 1 usd: 0
[77176.524327] CPU 10: hi: 0, btch: 1 usd: 0
[77176.524330] CPU 11: hi: 0, btch: 1 usd: 0
[77176.524333] CPU 12: hi: 0, btch: 1 usd: 0
[77176.524336] CPU 13: hi: 0, btch: 1 usd: 0
[77176.524339] CPU 14: hi: 0, btch: 1 usd: 0
[77176.524342] CPU 15: hi: 0, btch: 1 usd: 0
[77176.524345] CPU 16: hi: 0, btch: 1 usd: 0
[77176.524348] CPU 17: hi: 0, btch: 1 usd: 0
[77176.524351] CPU 18: hi: 0, btch: 1 usd: 0
[77176.524354] CPU 19: hi: 0, btch: 1 usd: 0
[77176.524358] CPU 20: hi: 0, btch: 1 usd: 0
[77176.524364] CPU 21: hi: 0, btch: 1 usd: 0
[77176.524367] CPU 22: hi: 0, btch: 1 usd: 0
[77176.524370] CPU 23: hi: 0, btch: 1 usd: 0
[77176.524372] DMA32 per-cpu:
[77176.524374] CPU 0: hi: 186, btch: 31 usd: 81
[77176.524377] CPU 1: hi: 186, btch: 31 usd: 66
[77176.524380] CPU 2: hi: 186, btch: 31 usd: 49
[77176.524385] CPU 3: hi: 186, btch: 31 usd: 67
[77176.524387] CPU 4: hi: 186, btch: 31 usd: 93
[77176.524390] CPU 5: hi: 186, btch: 31 usd: 73
[77176.524393] CPU 6: hi: 186, btch: 31 usd: 50
[77176.524396] CPU 7: hi: 186, btch: 31 usd: 79
[77176.524399] CPU 8: hi: 186, btch: 31 usd: 21
[77176.524402] CPU 9: hi: 186, btch: 31 usd: 38
[77176.524406] CPU 10: hi: 186, btch: 31 usd: 0
[77176.524409] CPU 11: hi: 186, btch: 31 usd: 75
[77176.524412] CPU 12: hi: 186, btch: 31 usd: 1
[77176.524414] CPU 13: hi: 186, btch: 31 usd: 4
[77176.524417] CPU 14: hi: 186, btch: 31 usd: 9
[77176.524420] CPU 15: hi: 186, btch: 31 usd: 0
[77176.524423] CPU 16: hi: 186, btch: 31 usd: 56
[77176.524426] CPU 17: hi: 186, btch: 31 usd: 35
[77176.524429] CPU 18: hi: 186, btch: 31 usd: 32
[77176.524432] CPU 19: hi: 186, btch: 31 usd: 39
[77176.524435] CPU 20: hi: 186, btch: 31 usd: 24
[77176.524438] CPU 21: hi: 186, btch: 31 usd: 0
[77176.524441] CPU 22: hi: 186, btch: 31 usd: 35
[77176.524444] CPU 23: hi: 186, btch: 31 usd: 51
[77176.524447] Normal per-cpu:
[77176.524449] CPU 0: hi: 186, btch: 31 usd: 29
[77176.524453] CPU 1: hi: 186, btch: 31 usd: 1
[77176.524456] CPU 2: hi: 186, btch: 31 usd: 30
[77176.524459] CPU 3: hi: 186, btch: 31 usd: 30
[77176.524463] CPU 4: hi: 186, btch: 31 usd: 30
[77176.524466] CPU 5: hi: 186, btch: 31 usd: 31
[77176.524469] CPU 6: hi: 186, btch: 31 usd: 0
[77176.524471] CPU 7: hi: 186, btch: 31 usd: 0
[77176.524474] CPU 8: hi: 186, btch: 31 usd: 30
[77176.524477] CPU 9: hi: 186, btch: 31 usd: 28
[77176.524480] CPU 10: hi: 186, btch: 31 usd: 0
[77176.524483] CPU 11: hi: 186, btch: 31 usd: 30
[77176.524486] CPU 12: hi: 186, btch: 31 usd: 0
[77176.524489] CPU 13: hi: 186, btch: 31 usd: 0
[77176.524492] CPU 14: hi: 186, btch: 31 usd: 0
[77176.524495] CPU 15: hi: 186, btch: 31 usd: 0
[77176.524498] CPU 16: hi: 186, btch: 31 usd: 0
[77176.524501] CPU 17: hi: 186, btch: 31 usd: 0
[77176.524504] CPU 18: hi: 186, btch: 31 usd: 0
[77176.524507] CPU 19: hi: 186, btch: 31 usd: 0
[77176.524510] CPU 20: hi: 186, btch: 31 usd: 0
[77176.524513] CPU 21: hi: 186, btch: 31 usd: 0
[77176.524516] CPU 22: hi: 186, btch: 31 usd: 0
[77176.524518] CPU 23: hi: 186, btch: 31 usd: 0
[77176.524524] active_anon:5675 inactive_anon:4676 isolated_anon:0
[77176.524526] active_file:146373 inactive_file:153543 isolated_file:480
[77176.524527] unevictable:0 dirty:167539 writeback:322 unstable:0
[77176.524528] free:5017 slab_reclaimable:15640 slab_unreclaimable:8972
[77176.524529] mapped:1114 shmem:7 pagetables:1908 bounce:0
[77176.524536] DMA free:9820kB min:32kB low:40kB high:48kB
active_anon:4kB inactive_anon:0kB active_file:616kB inactive_file:2212kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12740kB
mlocked:0kB dirty:2292kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:72kB slab_unreclaimable:108kB kernel_stack:0kB
pagetables:12kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:3040 all_unreclaimable? no
[77176.524541] lowmem_reserve[]: 0 1428 2452 2452
[77176.524551] DMA32 free:7768kB min:3680kB low:4600kB high:5520kB
active_anon:22696kB inactive_anon:18704kB active_file:584580kB
inactive_file:608508kB unevictable:0kB isolated(anon):0kB
isolated(file):1920kB present:1462496kB mlocked:0kB dirty:664128kB
writeback:1276kB mapped:4456kB shmem:28kB slab_reclaimable:62076kB
slab_unreclaimable:32292kB kernel_stack:5120kB pagetables:7620kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1971808
all_unreclaimable? yes
[77176.524556] lowmem_reserve[]: 0 0 1024 1024
[77176.524564] Normal free:2480kB min:2636kB low:3292kB high:3952kB
active_anon:0kB inactive_anon:0kB active_file:296kB inactive_file:3452kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1048700kB
mlocked:0kB dirty:3736kB writeback:12kB mapped:0kB shmem:0kB
slab_reclaimable:412kB slab_unreclaimable:3488kB kernel_stack:80kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:8192 all_unreclaimable? yes
[77176.524569] lowmem_reserve[]: 0 0 0 0
[77176.524574] DMA: 4*4kB 25*8kB 11*16kB 7*32kB 8*64kB 8*128kB 8*256kB
3*512kB 0*1024kB 0*2048kB 1*4096kB = 9832kB
[77176.524587] DMA32: 742*4kB 118*8kB 3*16kB 3*32kB 2*64kB 0*128kB
0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 7768kB
[77176.524600] Normal: 1*4kB 1*8kB 2*16kB 13*32kB 14*64kB 2*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1612kB
[77176.524613] 302308 total pagecache pages
[77176.524615] 1619 pages in swap cache
[77176.524617] Swap cache stats: add 40686, delete 39067, find 24687/26036
[77176.524619] Free swap = 10141956kB
[77176.524621] Total swap = 10239992kB
[77176.577607] 793456 pages RAM
[77176.577611] 436254 pages reserved
[77176.577613] 308627 pages shared
[77176.577615] 49249 pages non-shared
[77176.577620] Out of memory: kill process 5755 (python2.6) score 110492
or a child
[77176.577623] Killed process 5757 (python2.6)
Depending on what gets nuked by the OOM-killer, I am frequently left
with an unusable system that needs to be rebooted.
The machine always has plenty of memory available (1.5 GB devoted to
dom0, of which >1 GB is always just in "cached" state). For instance,
right now, on this same machine:
# free
total used free shared buffers cached
Mem: 1536512 1493112 43400 0 10284 1144904
-/+ buffers/cache: 337924 1198588
Swap: 10239992 74444 10165548
I have seen this OOM problem on a wide range of Xen versions, stretching
as far back as I can remember, including the most recent 4.1-unstable
and 2.6.32 pvops kernel (from yesterday, tested in the hope that they
would fix this). I haven't found a way to reliably reproduce it yet,
but I suspect that the problem relates to reasonably heavy disk or
network activity -- during this last one, I see that a domain was
briefly doing ~200 Mbps of downloads.
Anyone have any ideas on what this could be? Is RAM getting
spontaneously filled because a buffer somewhere grows too quickly, or
something like that? What can I try here?
-John
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|