[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel 3.11 / 3.12 OOM killer and Xen ballooning



On 01/07/2014 05:21 PM, James Dingwall wrote:
> Bob Liu wrote:
>> Could you confirm that this problem doesn't exist if loading tmem with
>> selfshrinking=0 during compile gcc? It seems that you are compiling
>> difference packages during your testing.
>> This will help to figure out whether selfshrinking is the root cause.
> Got an oom with selfshrinking=0, again during a gcc compile.
> Unfortunately I don't have a single test case which demonstrates the
> problem but as I mentioned before it will generally show up under
> compiles of large packages such as glibc, kdelibs, gcc etc.
> 

So the root cause is not because enabled selfshrinking.
Then what I can think of is that the xen-selfballoon driver was too
aggressive, too many pages were ballooned out which causeed heavy memory
pressure to guest OS.
And kswapd started to reclaim page until most of pages were
unreclaimable(all_unreclaimable=yes for all zones), then OOM Killer was
triggered.
In theory the balloon driver should give back ballooned out pages to
guest OS, but I'm afraid this procedure is not fast enough.

My suggestion is reserve a min memory for your guest OS so that the
xen-selfballoon won't be so aggressive.
You can do it through parameters selfballoon_reserved_mb or
selfballoon_min_usable_mb.

> I don't know if this is a separate or related issue but over the
> holidays I also had a problem with six of the guests on my system where
> kswapd was running at 100% and had clocked up >9000 minutes of cpu time
> even though there was otherwise no load on them.  Of the guests I
> restarted yesterday in this state two have already got in to the same
> state again, they are running a kernel with the first patch that you sent.
> 

Could you get the meminfo in guest OS at that time?
cat /proc/meminfo
cat /proc/vmstat

Thanks,
-Bob

> /sys/module/tmem/parameters/cleancache Y
> /sys/module/tmem/parameters/frontswap Y
> /sys/module/tmem/parameters/selfballooning Y
> /sys/module/tmem/parameters/selfshrinking N
> 
> James
> 
> [ 8212.940520] cc1plus invoked oom-killer: gfp_mask=0x200da, order=0,
> oom_score_adj=0
> [ 8212.940529] CPU: 1 PID: 23678 Comm: cc1plus Tainted: G W    3.12.5 #88
> [ 8212.940532]  ffff88001e38cdf8 ffff88000094f968 ffffffff8148f200
> ffff88001f90e8e8
> [ 8212.940536]  ffff88001e38c8c0 ffff88000094fa08 ffffffff8148ccf7
> ffff88000094f9b8
> [ 8212.940538]  ffffffff810f8d97 ffff88000094f998 ffffffff81006dc8
> ffff88000094f9a8
> [ 8212.940542] Call Trace:
> [ 8212.940554]  [<ffffffff8148f200>] dump_stack+0x46/0x58
> [ 8212.940558]  [<ffffffff8148ccf7>] dump_header.isra.9+0x6d/0x1cc
> [ 8212.940564]  [<ffffffff810f8d97>] ? super_cache_count+0xa8/0xb8
> [ 8212.940569]  [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22
> [ 8212.940573]  [<ffffffff81006ea9>] ? xen_clocksource_get_cycles+0x9/0xb
> [ 8212.940578]  [<ffffffff81494abe>] ?
> _raw_spin_unlock_irqrestore+0x47/0x62
> [ 8212.940583]  [<ffffffff81296b27>] ? ___ratelimit+0xcb/0xe8
> [ 8212.940588]  [<ffffffff810b2bbf>] oom_kill_process+0x70/0x2fd
> [ 8212.940592]  [<ffffffff810bca0e>] ? zone_reclaimable+0x11/0x1e
> [ 8212.940597]  [<ffffffff81048779>] ? has_ns_capability_noaudit+0x12/0x19
> [ 8212.940600]  [<ffffffff81048792>] ? has_capability_noaudit+0x12/0x14
> [ 8212.940603]  [<ffffffff810b32de>] out_of_memory+0x31b/0x34e
> [ 8212.940608]  [<ffffffff810b7438>] __alloc_pages_nodemask+0x65b/0x792
> [ 8212.940612]  [<ffffffff810e3da3>] alloc_pages_vma+0xd0/0x10c
> [ 8212.940617]  [<ffffffff810dd5a4>] read_swap_cache_async+0x70/0x120
> [ 8212.940620]  [<ffffffff810dd6e4>] swapin_readahead+0x90/0xd4
> [ 8212.940623]  [<ffffffff81005b35>] ? pte_mfn_to_pfn+0x59/0xcb
> [ 8212.940627]  [<ffffffff810cf99d>] handle_mm_fault+0x8a4/0xd54
> [ 8212.940630]  [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22
> [ 8212.940634]  [<ffffffff810115d2>] ? sched_clock+0x9/0xd
> [ 8212.940638]  [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75
> [ 8212.940641]  [<ffffffff8106823b>] ? arch_vtime_task_switch+0x81/0x86
> [ 8212.940646]  [<ffffffff81037f40>] __do_page_fault+0x3d8/0x437
> [ 8212.940649]  [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22
> [ 8212.940652]  [<ffffffff810115d2>] ? sched_clock+0x9/0xd
> [ 8212.940654]  [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75
> [ 8212.940658]  [<ffffffff810a45cc>] ? __acct_update_integrals+0xb4/0xbf
> [ 8212.940661]  [<ffffffff810a493f>] ? acct_account_cputime+0x17/0x19
> [ 8212.940663]  [<ffffffff81067c28>] ? account_user_time+0x67/0x92
> [ 8212.940666]  [<ffffffff8106811b>] ? vtime_account_user+0x4d/0x52
> [ 8212.940669]  [<ffffffff81037fd8>] do_page_fault+0x1a/0x5a
> [ 8212.940674]  [<ffffffff810a065f>] ? rcu_user_enter+0xe/0x10
> [ 8212.940677]  [<ffffffff81495158>] page_fault+0x28/0x30
> [ 8212.940679] Mem-Info:
> [ 8212.940681] Node 0 DMA per-cpu:
> [ 8212.940684] CPU    0: hi:    0, btch:   1 usd:   0
> [ 8212.940685] CPU    1: hi:    0, btch:   1 usd:   0
> [ 8212.940686] Node 0 DMA32 per-cpu:
> [ 8212.940688] CPU    0: hi:  186, btch:  31 usd: 116
> [ 8212.940690] CPU    1: hi:  186, btch:  31 usd: 124
> [ 8212.940691] Node 0 Normal per-cpu:
> [ 8212.940693] CPU    0: hi:    0, btch:   1 usd:   0
> [ 8212.940694] CPU    1: hi:    0, btch:   1 usd:   0
> [ 8212.940700] active_anon:105765 inactive_anon:105882 isolated_anon:0
>  active_file:8412 inactive_file:8612 isolated_file:0
>  unevictable:0 dirty:0 writeback:0 unstable:0
>  free:1143 slab_reclaimable:3575 slab_unreclaimable:3464
>  mapped:3792 shmem:6 pagetables:2534 bounce:0
>  free_cma:0 totalram:246132 balloontarget:306242
> [ 8212.940702] Node 0 DMA free:1964kB min:88kB low:108kB high:132kB
> active_anon:5092kB inactive_anon:5328kB active_file:416kB
> inactive_file:608kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:15996kB managed:15392kB mlocked:0kB dirty:0kB
> writeback:0kB mapped:320kB shmem:0kB slab_reclaimable:252kB
> slab_unreclaimable:492kB kernel_stack:120kB pagetables:252kB
> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
> pages_scanned:26951 all_unreclaimable? yes
> [ 8212.940711] lowmem_reserve[]: 0 469 469 469
> [ 8212.940715] Node 0 DMA32 free:2608kB min:2728kB low:3408kB
> high:4092kB active_anon:181456kB inactive_anon:181528kB
> active_file:22296kB inactive_file:22644kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:507904kB managed:466364kB
> mlocked:0kB dirty:0kB writeback:0kB mapped:8628kB shmem:20kB
> slab_reclaimable:10756kB slab_unreclaimable:12548kB kernel_stack:1688kB
> pagetables:8876kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
> pages_scanned:612393 all_unreclaimable? yes
> [ 8212.940722] lowmem_reserve[]: 0 0 0 0
> [ 8212.940725] Node 0 Normal free:0kB min:0kB low:0kB high:0kB
> active_anon:236512kB inactive_anon:236672kB active_file:10936kB
> inactive_file:11196kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:524288kB managed:502772kB mlocked:0kB
> dirty:0kB writeback:0kB mapped:6220kB shmem:4kB slab_reclaimable:3292kB
> slab_unreclaimable:816kB kernel_stack:64kB pagetables:1008kB
> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
> pages_scanned:745963 all_unreclaimable? yes
> [ 8212.940732] lowmem_reserve[]: 0 0 0 0
> [ 8212.940735] Node 0 DMA: 1*4kB (R) 0*8kB 4*16kB (R) 1*32kB (R) 1*64kB
> (R) 2*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1956kB
> [ 8212.940747] Node 0 DMA32: 652*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2608kB
> [ 8212.940756] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [ 8212.940765] 16847 total pagecache pages
> [ 8212.940766] 8381 pages in swap cache
> [ 8212.940768] Swap cache stats: add 741397, delete 733016, find
> 250268/342284
> [ 8212.940769] Free swap  = 1925576kB
> [ 8212.940770] Total swap = 2097148kB
> [ 8212.951044] 262143 pages RAM
> [ 8212.951046] 11939 pages reserved
> [ 8212.951047] 540820 pages shared
> [ 8212.951048] 240248 pages non-shared
> [ 8212.951050] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
> oom_score_adj name
> <snip process list>
> [ 8212.951310] Out of memory: Kill process 23721 (cc1plus) score 119 or
> sacrifice child
> [ 8212.951313] Killed process 23721 (cc1plus) total-vm:530268kB,
> anon-rss:350980kB, file-rss:9408kB
> [54810.683658] kjournald starting.  Commit interval 5 seconds
> [54810.684381] EXT3-fs (xvda1): using internal journal
> [54810.684402] EXT3-fs (xvda1): mounted filesystem with writeback data mode
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.