[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [linux-4.1 bisection] complete test-amd64-i386-xl-qemut-debianhvm-amd64
branch xen-unstable xenbranch xen-unstable job test-amd64-i386-xl-qemut-debianhvm-amd64 testid debian-hvm-install Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Bug introduced: c5ad33184354260be6d05de57e46a5498692f6d6 Bug not present: c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/97405/ commit c5ad33184354260be6d05de57e46a5498692f6d6 Author: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Date: Fri Jun 24 14:50:01 2016 -0700 mm/swap.c: flush lru pvecs on compound page arrival [ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ] Currently we can have compound pages held on per cpu pagevecs, which leads to a lot of memory unavailable for reclaim when needed. In the systems with hundreads of processors it can be GBs of memory. On of the way of reproducing the problem is to not call munmap explicitly on all mapped regions (i.e. after receiving SIGTERM). After that some pages (with THP enabled also huge pages) may end up on lru_add_pvec, example below. void main() { #pragma omp parallel { size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS , -1, 0); if (p != MAP_FAILED) memset(p, 0, size); //munmap(p, size); // uncomment to make the problem go away } } When we run it with THP enabled it will leave significant amount of memory on lru_add_pvec. This memory will be not reclaimed if we hit OOM, so when we run above program in a loop: for i in `seq 100`; do ./a.out; done many processes (95% in my case) will be killed by OOM. The primary point of the LRU add cache is to save the zone lru_lock contention with a hope that more pages will belong to the same zone and so their addition can be batched. The huge page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like a safer option when compared to a potential excess in the caching which can be quite large and much harder to fix because lru_add_drain_all is way to expensive and it is not really clear what would be a good moment to call it. Similarly we can reproduce the problem on lru_deactivate_pvec by adding: madvise(p, size, MADV_FREE); after memset. This patch flushes lru pvecs on compound page arrival making the problem less severe - after applying it kill rate of above example drops to 0%, due to reducing maximum amount of memory held on pvec from 28MB (with THP) to 56kB per CPU. Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@xxxxxxxxx Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> Cc: Ming Li <mingli199x@xxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <sasha.levin@xxxxxxxxxx> For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-4.1/test-amd64-i386-xl-qemut-debianhvm-amd64.debian-hvm-install.html Revision IDs in each graph node refer, respectively, to the Trees above. ---------------------------------------- Running cs-bisection-step --graph-out=/home/logs/results/bisect/linux-4.1/test-amd64-i386-xl-qemut-debianhvm-amd64.debian-hvm-install --summary-out=tmp/97405.bisection-summary --basis-template=96211 --blessings=real,real-bisect linux-4.1 test-amd64-i386-xl-qemut-debianhvm-amd64 debian-hvm-install Searching for failure / basis pass: 97279 fail [host=huxelrebe1] / 96211 [host=huxelrebe0] 96183 [host=pinot1] 96160 [host=baroque0] 95848 [host=rimava1] 95818 [host=pinot0] 95591 [host=chardonnay0] 95517 [host=fiano1] 95455 [host=fiano0] 95408 [host=nocera0] 94729 [host=huxelrebe0] 94034 [host=elbling1] 93220 [host=italia0] 93111 [host=fiano0] 92143 [host=pinot0] 91350 [host=fiano1] 91189 [host=chardonnay1] 91008 [host=rimava1] 90845 [host=baroque1] 89382 [host=elbling1] 89248 [host=baroque0] 88721 [host=rimava0] 88639 [host=chardonnay0] 88510 [host=fiano0] 88390 [host=italia0] 88251 [host=huxelrebe0] 88073 [host=merlot0] 87856 [host=chardonnay1] 87765 [host=pinot0] 87692 [host=elbling0] 87582 [host=italia1] 87465 [host=pinot1] 87295 [host=merlot1] 87204 [host=elbling1] 87117 [host=baroque0] 87031 [host=rimava1] 86912 ok. Failure / basis pass flights: 97279 / 86912 (tree with no url: minios) (tree with no url: ovmf) (tree with no url: seabios) Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git Latest 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 Basis pass 7f30737678023b5becaf0e2e012665f71b886a7d c530a75c1e6a472b0eb9558310b518f0dfcd8860 21f6526d1da331611ac5fe12967549d1a04e149b 316a862e5534249a6e6d876b4e203342d3fb870e a6f2cdb633bf519244a16674031b8034b581ba7f Generating revisions with ./adhoc-revtuple-generator git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#7f30737678023b5becaf0e2e012665f71b886a7d-5880876e94699ce010554f483ccf0009997955ca git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/qemu-xen-traditional.git#21f6526d1da331611ac5fe12967549d1a04e149b-6e20809727261599e8527c456eb078c0e89139a1 git://xenbits.xen.org/qemu-xen.git#316a862e5534249a6e6d876b4e203342d3fb870e-44a072f0de0d57c95c2212bbce02888832b7b74f git://xenbits.xen.org/xen.git#a6f2cdb633bf519244a16674031b8034b581ba7f-7da483b0236d8974cc97f81780dcf8e559a63175 Loaded 10934 nodes in revision graph Searching for test results: 86510 [host=elbling1] 86587 [host=huxelrebe0] 86654 [host=italia0] 86761 [host=fiano0] 86830 [host=chardonnay0] 86912 pass 7f30737678023b5becaf0e2e012665f71b886a7d c530a75c1e6a472b0eb9558310b518f0dfcd8860 21f6526d1da331611ac5fe12967549d1a04e149b 316a862e5534249a6e6d876b4e203342d3fb870e a6f2cdb633bf519244a16674031b8034b581ba7f 87031 [host=rimava1] 87117 [host=baroque0] 87204 [host=elbling1] 87295 [host=merlot1] 87465 [host=pinot1] 87582 [host=italia1] 87692 [host=elbling0] 87765 [host=pinot0] 87856 [host=chardonnay1] 88073 [host=merlot0] 88251 [host=huxelrebe0] 88390 [host=italia0] 88510 [host=fiano0] 88639 [host=chardonnay0] 88721 [host=rimava0] 89248 [host=baroque0] 89382 [host=elbling1] 90845 [host=baroque1] 91008 [host=rimava1] 91189 [host=chardonnay1] 91350 [host=fiano1] 92143 [host=pinot0] 93111 [host=fiano0] 93220 [host=italia0] 94034 [host=elbling1] 94729 [host=huxelrebe0] 95408 [host=nocera0] 95455 [host=fiano0] 95517 [host=fiano1] 95591 [host=chardonnay0] 95848 [host=rimava1] 95818 [host=pinot0] 96211 [host=huxelrebe0] 96160 [host=baroque0] 96183 [host=pinot1] 97320 pass 1d155a6c311d0e7855181638b3b8b6e76302fe6d c530a75c1e6a472b0eb9558310b518f0dfcd8860 21f6526d1da331611ac5fe12967549d1a04e149b 316a862e5534249a6e6d876b4e203342d3fb870e 83a5839960db70f3552417379ad2677a6b473b20 97279 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97316 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97350 pass 6a2f15857e4debf46d34fd897e9d3eaf70590e33 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97295 pass 7f30737678023b5becaf0e2e012665f71b886a7d c530a75c1e6a472b0eb9558310b518f0dfcd8860 21f6526d1da331611ac5fe12967549d1a04e149b 316a862e5534249a6e6d876b4e203342d3fb870e a6f2cdb633bf519244a16674031b8034b581ba7f 97326 pass 888172862fa78505c4e4674c205a06586443d83f c530a75c1e6a472b0eb9558310b518f0dfcd8860 df553c056104e3dd8a2bd2e72539a57c4c085bae 44a072f0de0d57c95c2212bbce02888832b7b74f 212d27297af9c70c912aaa4eea25756721901567 97330 pass 0c3f25d8c6aa0ff475a86cd5d3b7e2c7b6eb496f c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97336 fail 95e4695bd33e9f8ddabaf4fa64857e9bed1bda80 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97342 fail 284f69fb49e2e385203f52441b324b9a68461d6b c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97345 pass 0764832cd4b2472693529696578563892929701a c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97390 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97395 fail c5ad33184354260be6d05de57e46a5498692f6d6 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97363 pass 691c507ec01fa0cab2a9cfb5bd4398ddd5480a8a c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97398 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97373 pass 7f3724b8951735ef1d5ae4f2846b8af98a665d73 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97401 fail c5ad33184354260be6d05de57e46a5498692f6d6 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97388 fail c5ad33184354260be6d05de57e46a5498692f6d6 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97402 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97405 fail c5ad33184354260be6d05de57e46a5498692f6d6 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 Searching for interesting versions Result found: flight 86912 (pass), for basis pass Result found: flight 97279 (fail), for basis failure Repro found: flight 97295 (pass), for basis pass Repro found: flight 97316 (fail), for basis failure 0 revisions at c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 No revisions left to test, checking graph state. Result found: flight 97390 (pass), for last pass Result found: flight 97395 (fail), for first failure Repro found: flight 97398 (pass), for last pass Repro found: flight 97401 (fail), for first failure Repro found: flight 97402 (pass), for last pass Repro found: flight 97405 (fail), for first failure *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Bug introduced: c5ad33184354260be6d05de57e46a5498692f6d6 Bug not present: c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/97405/ commit c5ad33184354260be6d05de57e46a5498692f6d6 Author: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Date: Fri Jun 24 14:50:01 2016 -0700 mm/swap.c: flush lru pvecs on compound page arrival [ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ] Currently we can have compound pages held on per cpu pagevecs, which leads to a lot of memory unavailable for reclaim when needed. In the systems with hundreads of processors it can be GBs of memory. On of the way of reproducing the problem is to not call munmap explicitly on all mapped regions (i.e. after receiving SIGTERM). After that some pages (with THP enabled also huge pages) may end up on lru_add_pvec, example below. void main() { #pragma omp parallel { size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS , -1, 0); if (p != MAP_FAILED) memset(p, 0, size); //munmap(p, size); // uncomment to make the problem go away } } When we run it with THP enabled it will leave significant amount of memory on lru_add_pvec. This memory will be not reclaimed if we hit OOM, so when we run above program in a loop: for i in `seq 100`; do ./a.out; done many processes (95% in my case) will be killed by OOM. The primary point of the LRU add cache is to save the zone lru_lock contention with a hope that more pages will belong to the same zone and so their addition can be batched. The huge page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like a safer option when compared to a potential excess in the caching which can be quite large and much harder to fix because lru_add_drain_all is way to expensive and it is not really clear what would be a good moment to call it. Similarly we can reproduce the problem on lru_deactivate_pvec by adding: madvise(p, size, MADV_FREE); after memset. This patch flushes lru pvecs on compound page arrival making the problem less severe - after applying it kill rate of above example drops to 0%, due to reducing maximum amount of memory held on pvec from 28MB (with THP) to 56kB per CPU. Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@xxxxxxxxx Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> Cc: Ming Li <mingli199x@xxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <sasha.levin@xxxxxxxxxx> dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.348259 to fit pnmtopng: 53 colors found Revision graph left in /home/logs/results/bisect/linux-4.1/test-amd64-i386-xl-qemut-debianhvm-amd64.debian-hvm-install.{dot,ps,png,html,svg}. ---------------------------------------- 97405: tolerable ALL FAIL flight 97405 linux-4.1 real-bisect [real] http://logs.test-lab.xenproject.org/osstest/logs/97405/ Failures :-/ but no regressions. Tests which did not succeed, including tests which could not be run: test-amd64-i386-xl-qemut-debianhvm-amd64 9 debian-hvm-install fail baseline untested jobs: test-amd64-i386-xl-qemut-debianhvm-amd64 fail ------------------------------------------------------------ sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |