[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Kernel 3.11 / 3.12 OOM killer and Xen ballooning
Bob Liu wrote: On 01/07/2014 05:21 PM, James Dingwall wrote:Bob Liu wrote:Could you confirm that this problem doesn't exist if loading tmem with selfshrinking=0 during compile gcc? It seems that you are compiling difference packages during your testing. This will help to figure out whether selfshrinking is the root cause.Got an oom with selfshrinking=0, again during a gcc compile. Unfortunately I don't have a single test case which demonstrates the problem but as I mentioned before it will generally show up under compiles of large packages such as glibc, kdelibs, gcc etc.So the root cause is not because enabled selfshrinking. Then what I can think of is that the xen-selfballoon driver was too aggressive, too many pages were ballooned out which causeed heavy memory pressure to guest OS. And kswapd started to reclaim page until most of pages were unreclaimable(all_unreclaimable=yes for all zones), then OOM Killer was triggered. In theory the balloon driver should give back ballooned out pages to guest OS, but I'm afraid this procedure is not fast enough. My suggestion is reserve a min memory for your guest OS so that the xen-selfballoon won't be so aggressive. You can do it through parameters selfballoon_reserved_mb or selfballoon_min_usable_mb. I will try your suggestions and let you know. I don't know if this is a separate or related issue but over the holidays I also had a problem with six of the guests on my system where kswapd was running at 100% and had clocked up >9000 minutes of cpu time even though there was otherwise no load on them. Of the guests I restarted yesterday in this state two have already got in to the same state again, they are running a kernel with the first patch that you sent.Could you get the meminfo in guest OS at that time? cat /proc/meminfo MemTotal: 364080 kB MemFree: 130448 kB Buffers: 1260 kB Cached: 129352 kB SwapCached: 300 kB Active: 21412 kB Inactive: 160888 kB Active(anon): 7732 kB Inactive(anon): 44676 kB Active(file): 13680 kB Inactive(file): 116212 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 2097148 kB SwapFree: 2096704 kB Dirty: 44 kB Writeback: 0 kB AnonPages: 51532 kB Mapped: 14172 kB Shmem: 720 kB Slab: 19580 kB SReclaimable: 7732 kB SUnreclaim: 11848 kB KernelStack: 1824 kB PageTables: 7968 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 2279188 kB Committed_AS: 338792 kB VmallocTotal: 34359738367 kB VmallocUsed: 9020 kB VmallocChunk: 34359716472 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB DirectMap4k: 1048576 kB DirectMap2M: 0 kB cat /proc/vmstat nr_free_pages 32775 nr_alloc_batch 0 nr_inactive_anon 11167 nr_active_anon 1904 nr_inactive_file 29054 nr_active_file 3420 nr_unevictable 0 nr_mlock 0 nr_anon_pages 12869 nr_mapped 3543 nr_file_pages 32724 nr_dirty 5 nr_writeback 0 nr_slab_reclaimable 1933 nr_slab_unreclaimable 2959 nr_page_table_pages 1988 nr_kernel_stack 228 nr_unstable 0 nr_bounce 0 nr_vmscan_write 781197 nr_vmscan_immediate_reclaim 6241 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 180 nr_dirtied 86426 nr_written 860157 numa_hit 8323637 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 8323637 numa_other 0 nr_anon_transparent_hugepages 0 nr_free_cma 0 nr_dirty_threshold 15359 nr_dirty_background_threshold 7679 pgpgin 2044246 pgpgout 643646 pswpin 123 pswpout 153 pgalloc_dma 164528 pgalloc_dma32 7332263 pgalloc_normal 1018515 pgalloc_movable 0 pgfree 8548450 pgactivate 2011347 pgdeactivate 2274842 pgfault 7231978 pgmajfault 345038 pgrefill_dma 55260 pgrefill_dma32 2261099 pgrefill_normal 1771 pgrefill_movable 0 pgsteal_kswapd_dma 44877 pgsteal_kswapd_dma32 2586249 pgsteal_kswapd_normal 0 pgsteal_kswapd_movable 0 pgsteal_direct_dma 0 pgsteal_direct_dma32 37 pgsteal_direct_normal 0 pgsteal_direct_movable 0 pgscan_kswapd_dma 204746 pgscan_kswapd_dma32 4474736 pgscan_kswapd_normal 0 pgscan_kswapd_movable 0 pgscan_direct_dma 0 pgscan_direct_dma32 39 pgscan_direct_normal 0 pgscan_direct_movable 0 pgscan_direct_throttle 0 zone_reclaim_failed 0 pginodesteal 0 slabs_scanned 2713984 kswapd_inodesteal 41065 kswapd_low_wmark_hit_quickly 14894 kswapd_high_wmark_hit_quickly 115972041 pageoutrun 115992287 allocstall 1 pgrotated 8495 numa_pte_updates 0 numa_huge_pte_updates 0 numa_hint_faults 0 numa_hint_faults_local 0 numa_pages_migrated 0 pgmigrate_success 0 pgmigrate_fail 0 compact_migrate_scanned 0 compact_free_scanned 0 compact_isolated 0 compact_stall 0 compact_fail 0 compact_success 0 unevictable_pgs_culled 29364 unevictable_pgs_scanned 0 unevictable_pgs_rescued 29137 unevictable_pgs_mlocked 29542 unevictable_pgs_munlocked 29542 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 thp_fault_alloc 0 thp_fault_fallback 0 thp_collapse_alloc 0 thp_collapse_alloc_failed 0 thp_split 0 thp_zero_page_alloc 0 thp_zero_page_alloc_failed 0 nr_tlb_remote_flush 10666 nr_tlb_remote_flush_received 21336 nr_tlb_local_flush_all 65481 nr_tlb_local_flush_one 1431260 Thanks, -Bob/sys/module/tmem/parameters/cleancache Y /sys/module/tmem/parameters/frontswap Y /sys/module/tmem/parameters/selfballooning Y /sys/module/tmem/parameters/selfshrinking N James [ 8212.940520] cc1plus invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [ 8212.940529] CPU: 1 PID: 23678 Comm: cc1plus Tainted: G W 3.12.5 #88 [ 8212.940532] ffff88001e38cdf8 ffff88000094f968 ffffffff8148f200 ffff88001f90e8e8 [ 8212.940536] ffff88001e38c8c0 ffff88000094fa08 ffffffff8148ccf7 ffff88000094f9b8 [ 8212.940538] ffffffff810f8d97 ffff88000094f998 ffffffff81006dc8 ffff88000094f9a8 [ 8212.940542] Call Trace: [ 8212.940554] [<ffffffff8148f200>] dump_stack+0x46/0x58 [ 8212.940558] [<ffffffff8148ccf7>] dump_header.isra.9+0x6d/0x1cc [ 8212.940564] [<ffffffff810f8d97>] ? super_cache_count+0xa8/0xb8 [ 8212.940569] [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22 [ 8212.940573] [<ffffffff81006ea9>] ? xen_clocksource_get_cycles+0x9/0xb [ 8212.940578] [<ffffffff81494abe>] ? _raw_spin_unlock_irqrestore+0x47/0x62 [ 8212.940583] [<ffffffff81296b27>] ? ___ratelimit+0xcb/0xe8 [ 8212.940588] [<ffffffff810b2bbf>] oom_kill_process+0x70/0x2fd [ 8212.940592] [<ffffffff810bca0e>] ? zone_reclaimable+0x11/0x1e [ 8212.940597] [<ffffffff81048779>] ? has_ns_capability_noaudit+0x12/0x19 [ 8212.940600] [<ffffffff81048792>] ? has_capability_noaudit+0x12/0x14 [ 8212.940603] [<ffffffff810b32de>] out_of_memory+0x31b/0x34e [ 8212.940608] [<ffffffff810b7438>] __alloc_pages_nodemask+0x65b/0x792 [ 8212.940612] [<ffffffff810e3da3>] alloc_pages_vma+0xd0/0x10c [ 8212.940617] [<ffffffff810dd5a4>] read_swap_cache_async+0x70/0x120 [ 8212.940620] [<ffffffff810dd6e4>] swapin_readahead+0x90/0xd4 [ 8212.940623] [<ffffffff81005b35>] ? pte_mfn_to_pfn+0x59/0xcb [ 8212.940627] [<ffffffff810cf99d>] handle_mm_fault+0x8a4/0xd54 [ 8212.940630] [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22 [ 8212.940634] [<ffffffff810115d2>] ? sched_clock+0x9/0xd [ 8212.940638] [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75 [ 8212.940641] [<ffffffff8106823b>] ? arch_vtime_task_switch+0x81/0x86 [ 8212.940646] [<ffffffff81037f40>] __do_page_fault+0x3d8/0x437 [ 8212.940649] [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22 [ 8212.940652] [<ffffffff810115d2>] ? sched_clock+0x9/0xd [ 8212.940654] [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75 [ 8212.940658] [<ffffffff810a45cc>] ? __acct_update_integrals+0xb4/0xbf [ 8212.940661] [<ffffffff810a493f>] ? acct_account_cputime+0x17/0x19 [ 8212.940663] [<ffffffff81067c28>] ? account_user_time+0x67/0x92 [ 8212.940666] [<ffffffff8106811b>] ? vtime_account_user+0x4d/0x52 [ 8212.940669] [<ffffffff81037fd8>] do_page_fault+0x1a/0x5a [ 8212.940674] [<ffffffff810a065f>] ? rcu_user_enter+0xe/0x10 [ 8212.940677] [<ffffffff81495158>] page_fault+0x28/0x30 [ 8212.940679] Mem-Info: [ 8212.940681] Node 0 DMA per-cpu: [ 8212.940684] CPU 0: hi: 0, btch: 1 usd: 0 [ 8212.940685] CPU 1: hi: 0, btch: 1 usd: 0 [ 8212.940686] Node 0 DMA32 per-cpu: [ 8212.940688] CPU 0: hi: 186, btch: 31 usd: 116 [ 8212.940690] CPU 1: hi: 186, btch: 31 usd: 124 [ 8212.940691] Node 0 Normal per-cpu: [ 8212.940693] CPU 0: hi: 0, btch: 1 usd: 0 [ 8212.940694] CPU 1: hi: 0, btch: 1 usd: 0 [ 8212.940700] active_anon:105765 inactive_anon:105882 isolated_anon:0 active_file:8412 inactive_file:8612 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 free:1143 slab_reclaimable:3575 slab_unreclaimable:3464 mapped:3792 shmem:6 pagetables:2534 bounce:0 free_cma:0 totalram:246132 balloontarget:306242 [ 8212.940702] Node 0 DMA free:1964kB min:88kB low:108kB high:132kB active_anon:5092kB inactive_anon:5328kB active_file:416kB inactive_file:608kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15996kB managed:15392kB mlocked:0kB dirty:0kB writeback:0kB mapped:320kB shmem:0kB slab_reclaimable:252kB slab_unreclaimable:492kB kernel_stack:120kB pagetables:252kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:26951 all_unreclaimable? yes [ 8212.940711] lowmem_reserve[]: 0 469 469 469 [ 8212.940715] Node 0 DMA32 free:2608kB min:2728kB low:3408kB high:4092kB active_anon:181456kB inactive_anon:181528kB active_file:22296kB inactive_file:22644kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:507904kB managed:466364kB mlocked:0kB dirty:0kB writeback:0kB mapped:8628kB shmem:20kB slab_reclaimable:10756kB slab_unreclaimable:12548kB kernel_stack:1688kB pagetables:8876kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:612393 all_unreclaimable? yes [ 8212.940722] lowmem_reserve[]: 0 0 0 0 [ 8212.940725] Node 0 Normal free:0kB min:0kB low:0kB high:0kB active_anon:236512kB inactive_anon:236672kB active_file:10936kB inactive_file:11196kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:502772kB mlocked:0kB dirty:0kB writeback:0kB mapped:6220kB shmem:4kB slab_reclaimable:3292kB slab_unreclaimable:816kB kernel_stack:64kB pagetables:1008kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:745963 all_unreclaimable? yes [ 8212.940732] lowmem_reserve[]: 0 0 0 0 [ 8212.940735] Node 0 DMA: 1*4kB (R) 0*8kB 4*16kB (R) 1*32kB (R) 1*64kB (R) 2*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1956kB [ 8212.940747] Node 0 DMA32: 652*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2608kB [ 8212.940756] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 8212.940765] 16847 total pagecache pages [ 8212.940766] 8381 pages in swap cache [ 8212.940768] Swap cache stats: add 741397, delete 733016, find 250268/342284 [ 8212.940769] Free swap = 1925576kB [ 8212.940770] Total swap = 2097148kB [ 8212.951044] 262143 pages RAM [ 8212.951046] 11939 pages reserved [ 8212.951047] 540820 pages shared [ 8212.951048] 240248 pages non-shared [ 8212.951050] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name <snip process list> [ 8212.951310] Out of memory: Kill process 23721 (cc1plus) score 119 or sacrifice child [ 8212.951313] Killed process 23721 (cc1plus) total-vm:530268kB, anon-rss:350980kB, file-rss:9408kB [54810.683658] kjournald starting. Commit interval 5 seconds [54810.684381] EXT3-fs (xvda1): using internal journal [54810.684402] EXT3-fs (xvda1): mounted filesystem with writeback data mode _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |