[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Kernel 3.11 / 3.12 OOM killer and Xen ballooning
Bob Liu wrote: On 01/07/2014 05:21 PM, James Dingwall wrote:Bob Liu wrote:Could you confirm that this problem doesn't exist if loading tmem with selfshrinking=0 during compile gcc? It seems that you are compiling difference packages during your testing. This will help to figure out whether selfshrinking is the root cause.Got an oom with selfshrinking=0, again during a gcc compile. Unfortunately I don't have a single test case which demonstrates the problem but as I mentioned before it will generally show up under compiles of large packages such as glibc, kdelibs, gcc etc.So the root cause is not because enabled selfshrinking. Then what I can think of is that the xen-selfballoon driver was too aggressive, too many pages were ballooned out which causeed heavy memory pressure to guest OS. And kswapd started to reclaim page until most of pages were unreclaimable(all_unreclaimable=yes for all zones), then OOM Killer was triggered. In theory the balloon driver should give back ballooned out pages to guest OS, but I'm afraid this procedure is not fast enough. My suggestion is reserve a min memory for your guest OS so that the xen-selfballoon won't be so aggressive. You can do it through parameters selfballoon_reserved_mb or selfballoon_min_usable_mb.I don't know if this is a separate or related issue but over the holidays I also had a problem with six of the guests on my system where kswapd was running at 100% and had clocked up >9000 minutes of cpu time even though there was otherwise no load on them. Of the guests I restarted yesterday in this state two have already got in to the same state again, they are running a kernel with the first patch that you sent. As soon as I echo 32 both (originally 0) /sys/devices/system/xen_memory/xen_memory0/selfballoon/selfballoon_reserved_mb /sys/devices/system/xen_memory/xen_memory0/selfballoon/selfballoon_min_usable_mbThen the kswapd process stopped running at 100%. Unfortunately I didn't check between the two commands to see if one by itself made a difference but I'll look for that next time. Could you get the meminfo in guest OS at that time? After cat /proc/meminfo MemTotal: 397028 kB MemFree: 163756 kB Buffers: 1260 kB Cached: 129284 kB SwapCached: 132 kB Active: 22664 kB Inactive: 159576 kB Active(anon): 8004 kB Inactive(anon): 44412 kB Active(file): 14660 kB Inactive(file): 115164 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 2097148 kB SwapFree: 2096896 kB Dirty: 20 kB Writeback: 0 kB AnonPages: 51640 kB Mapped: 14136 kB Shmem: 720 kB Slab: 19492 kB SReclaimable: 7692 kB SUnreclaim: 11800 kB KernelStack: 1816 kB PageTables: 7928 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 2295660 kB Committed_AS: 338552 kB VmallocTotal: 34359738367 kB VmallocUsed: 9020 kB VmallocChunk: 34359716408 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB DirectMap4k: 1048576 kB DirectMap2M: 0 kB cat /proc/vmstat nr_free_pages 40916 nr_alloc_batch 0 nr_inactive_anon 11102 nr_active_anon 2009 nr_inactive_file 28791 nr_active_file 3665 nr_unevictable 0 nr_mlock 0 nr_anon_pages 12904 nr_mapped 3534 nr_file_pages 32669 nr_dirty 5 nr_writeback 0 nr_slab_reclaimable 1923 nr_slab_unreclaimable 2945 nr_page_table_pages 1982 nr_kernel_stack 227 nr_unstable 0 nr_bounce 0 nr_vmscan_write 781891 nr_vmscan_immediate_reclaim 6245 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 180 nr_dirtied 86609 nr_written 861010 numa_hit 8353372 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 8353372 numa_other 0 nr_anon_transparent_hugepages 0 nr_free_cma 0 nr_dirty_threshold 16991 nr_dirty_background_threshold 8495 pgpgin 2044575 pgpgout 645866 pswpin 123 pswpout 153 pgalloc_dma 164944 pgalloc_dma32 7347917 pgalloc_normal 1032559 pgalloc_movable 0 pgfree 8586607 pgactivate 2012718 pgdeactivate 2276721 pgfault 7295414 pgmajfault 345301 pgrefill_dma 55271 pgrefill_dma32 2263007 pgrefill_normal 1771 pgrefill_movable 0 pgsteal_kswapd_dma 44880 pgsteal_kswapd_dma32 2587500 pgsteal_kswapd_normal 0 pgsteal_kswapd_movable 0 pgsteal_direct_dma 0 pgsteal_direct_dma32 37 pgsteal_direct_normal 0 pgsteal_direct_movable 0 pgscan_kswapd_dma 204749 pgscan_kswapd_dma32 4477230 pgscan_kswapd_normal 0 pgscan_kswapd_movable 0 pgscan_direct_dma 0 pgscan_direct_dma32 39 pgscan_direct_normal 0 pgscan_direct_movable 0 pgscan_direct_throttle 0 zone_reclaim_failed 0 pginodesteal 0 slabs_scanned 2720128 kswapd_inodesteal 41065 kswapd_low_wmark_hit_quickly 14897 kswapd_high_wmark_hit_quickly 116697740 pageoutrun 116717997 allocstall 1 pgrotated 8497 numa_pte_updates 0 numa_huge_pte_updates 0 numa_hint_faults 0 numa_hint_faults_local 0 numa_pages_migrated 0 pgmigrate_success 0 pgmigrate_fail 0 compact_migrate_scanned 0 compact_free_scanned 0 compact_isolated 0 compact_stall 0 compact_fail 0 compact_success 0 unevictable_pgs_culled 29365 unevictable_pgs_scanned 0 unevictable_pgs_rescued 29145 unevictable_pgs_mlocked 29550 unevictable_pgs_munlocked 29550 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 thp_fault_alloc 0 thp_fault_fallback 0 thp_collapse_alloc 0 thp_collapse_alloc_failed 0 thp_split 0 thp_zero_page_alloc 0 thp_zero_page_alloc_failed 0 nr_tlb_remote_flush 10780 nr_tlb_remote_flush_received 21564 nr_tlb_local_flush_all 66247 nr_tlb_local_flush_one 1446496 Thanks, -Bob/sys/module/tmem/parameters/cleancache Y /sys/module/tmem/parameters/frontswap Y /sys/module/tmem/parameters/selfballooning Y /sys/module/tmem/parameters/selfshrinking N James [ 8212.940520] cc1plus invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [ 8212.940529] CPU: 1 PID: 23678 Comm: cc1plus Tainted: G W 3.12.5 #88 [ 8212.940532] ffff88001e38cdf8 ffff88000094f968 ffffffff8148f200 ffff88001f90e8e8 [ 8212.940536] ffff88001e38c8c0 ffff88000094fa08 ffffffff8148ccf7 ffff88000094f9b8 [ 8212.940538] ffffffff810f8d97 ffff88000094f998 ffffffff81006dc8 ffff88000094f9a8 [ 8212.940542] Call Trace: [ 8212.940554] [<ffffffff8148f200>] dump_stack+0x46/0x58 [ 8212.940558] [<ffffffff8148ccf7>] dump_header.isra.9+0x6d/0x1cc [ 8212.940564] [<ffffffff810f8d97>] ? super_cache_count+0xa8/0xb8 [ 8212.940569] [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22 [ 8212.940573] [<ffffffff81006ea9>] ? xen_clocksource_get_cycles+0x9/0xb [ 8212.940578] [<ffffffff81494abe>] ? _raw_spin_unlock_irqrestore+0x47/0x62 [ 8212.940583] [<ffffffff81296b27>] ? ___ratelimit+0xcb/0xe8 [ 8212.940588] [<ffffffff810b2bbf>] oom_kill_process+0x70/0x2fd [ 8212.940592] [<ffffffff810bca0e>] ? zone_reclaimable+0x11/0x1e [ 8212.940597] [<ffffffff81048779>] ? has_ns_capability_noaudit+0x12/0x19 [ 8212.940600] [<ffffffff81048792>] ? has_capability_noaudit+0x12/0x14 [ 8212.940603] [<ffffffff810b32de>] out_of_memory+0x31b/0x34e [ 8212.940608] [<ffffffff810b7438>] __alloc_pages_nodemask+0x65b/0x792 [ 8212.940612] [<ffffffff810e3da3>] alloc_pages_vma+0xd0/0x10c [ 8212.940617] [<ffffffff810dd5a4>] read_swap_cache_async+0x70/0x120 [ 8212.940620] [<ffffffff810dd6e4>] swapin_readahead+0x90/0xd4 [ 8212.940623] [<ffffffff81005b35>] ? pte_mfn_to_pfn+0x59/0xcb [ 8212.940627] [<ffffffff810cf99d>] handle_mm_fault+0x8a4/0xd54 [ 8212.940630] [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22 [ 8212.940634] [<ffffffff810115d2>] ? sched_clock+0x9/0xd [ 8212.940638] [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75 [ 8212.940641] [<ffffffff8106823b>] ? arch_vtime_task_switch+0x81/0x86 [ 8212.940646] [<ffffffff81037f40>] __do_page_fault+0x3d8/0x437 [ 8212.940649] [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22 [ 8212.940652] [<ffffffff810115d2>] ? sched_clock+0x9/0xd [ 8212.940654] [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75 [ 8212.940658] [<ffffffff810a45cc>] ? __acct_update_integrals+0xb4/0xbf [ 8212.940661] [<ffffffff810a493f>] ? acct_account_cputime+0x17/0x19 [ 8212.940663] [<ffffffff81067c28>] ? account_user_time+0x67/0x92 [ 8212.940666] [<ffffffff8106811b>] ? vtime_account_user+0x4d/0x52 [ 8212.940669] [<ffffffff81037fd8>] do_page_fault+0x1a/0x5a [ 8212.940674] [<ffffffff810a065f>] ? rcu_user_enter+0xe/0x10 [ 8212.940677] [<ffffffff81495158>] page_fault+0x28/0x30 [ 8212.940679] Mem-Info: [ 8212.940681] Node 0 DMA per-cpu: [ 8212.940684] CPU 0: hi: 0, btch: 1 usd: 0 [ 8212.940685] CPU 1: hi: 0, btch: 1 usd: 0 [ 8212.940686] Node 0 DMA32 per-cpu: [ 8212.940688] CPU 0: hi: 186, btch: 31 usd: 116 [ 8212.940690] CPU 1: hi: 186, btch: 31 usd: 124 [ 8212.940691] Node 0 Normal per-cpu: [ 8212.940693] CPU 0: hi: 0, btch: 1 usd: 0 [ 8212.940694] CPU 1: hi: 0, btch: 1 usd: 0 [ 8212.940700] active_anon:105765 inactive_anon:105882 isolated_anon:0 active_file:8412 inactive_file:8612 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 free:1143 slab_reclaimable:3575 slab_unreclaimable:3464 mapped:3792 shmem:6 pagetables:2534 bounce:0 free_cma:0 totalram:246132 balloontarget:306242 [ 8212.940702] Node 0 DMA free:1964kB min:88kB low:108kB high:132kB active_anon:5092kB inactive_anon:5328kB active_file:416kB inactive_file:608kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15996kB managed:15392kB mlocked:0kB dirty:0kB writeback:0kB mapped:320kB shmem:0kB slab_reclaimable:252kB slab_unreclaimable:492kB kernel_stack:120kB pagetables:252kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:26951 all_unreclaimable? yes [ 8212.940711] lowmem_reserve[]: 0 469 469 469 [ 8212.940715] Node 0 DMA32 free:2608kB min:2728kB low:3408kB high:4092kB active_anon:181456kB inactive_anon:181528kB active_file:22296kB inactive_file:22644kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:507904kB managed:466364kB mlocked:0kB dirty:0kB writeback:0kB mapped:8628kB shmem:20kB slab_reclaimable:10756kB slab_unreclaimable:12548kB kernel_stack:1688kB pagetables:8876kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:612393 all_unreclaimable? yes [ 8212.940722] lowmem_reserve[]: 0 0 0 0 [ 8212.940725] Node 0 Normal free:0kB min:0kB low:0kB high:0kB active_anon:236512kB inactive_anon:236672kB active_file:10936kB inactive_file:11196kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:502772kB mlocked:0kB dirty:0kB writeback:0kB mapped:6220kB shmem:4kB slab_reclaimable:3292kB slab_unreclaimable:816kB kernel_stack:64kB pagetables:1008kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:745963 all_unreclaimable? yes [ 8212.940732] lowmem_reserve[]: 0 0 0 0 [ 8212.940735] Node 0 DMA: 1*4kB (R) 0*8kB 4*16kB (R) 1*32kB (R) 1*64kB (R) 2*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1956kB [ 8212.940747] Node 0 DMA32: 652*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2608kB [ 8212.940756] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 8212.940765] 16847 total pagecache pages [ 8212.940766] 8381 pages in swap cache [ 8212.940768] Swap cache stats: add 741397, delete 733016, find 250268/342284 [ 8212.940769] Free swap = 1925576kB [ 8212.940770] Total swap = 2097148kB [ 8212.951044] 262143 pages RAM [ 8212.951046] 11939 pages reserved [ 8212.951047] 540820 pages shared [ 8212.951048] 240248 pages non-shared [ 8212.951050] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name <snip process list> [ 8212.951310] Out of memory: Kill process 23721 (cc1plus) score 119 or sacrifice child [ 8212.951313] Killed process 23721 (cc1plus) total-vm:530268kB, anon-rss:350980kB, file-rss:9408kB [54810.683658] kjournald starting. Commit interval 5 seconds [54810.684381] EXT3-fs (xvda1): using internal journal [54810.684402] EXT3-fs (xvda1): mounted filesystem with writeback data mode -- *James Dingwall* Script Monkeyzynstra-signature-logo <http://www.zynstra.com/>twitter-black <http://www.twitter.com/zynstra>linkedin-black <http://www.linkedin.com/company/zynstra> Zynstra is a private limited company registered in England and Wales (registered number 07864369). Our registered office is 5 New Street Square, London, EC4A 3TW and our headquarters are at Bath Ventures, Broad Quay, Bath, BA1 1UD. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |