[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [linux-3.18 bisection] complete test-amd64-i386-xl-qemut-debianhvm-amd64
branch xen-unstable xenbranch xen-unstable job test-amd64-i386-xl-qemut-debianhvm-amd64 testid debian-hvm-install Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Bug introduced: a2d8c514753276394d68414f563591f174ef86cb Bug not present: 8f620446135b64ca6f96cf32066a76d64e79a388 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/97669/ commit a2d8c514753276394d68414f563591f174ef86cb Author: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Date: Fri Jun 24 14:50:01 2016 -0700 mm/swap.c: flush lru pvecs on compound page arrival [ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ] Currently we can have compound pages held on per cpu pagevecs, which leads to a lot of memory unavailable for reclaim when needed. In the systems with hundreads of processors it can be GBs of memory. On of the way of reproducing the problem is to not call munmap explicitly on all mapped regions (i.e. after receiving SIGTERM). After that some pages (with THP enabled also huge pages) may end up on lru_add_pvec, example below. void main() { #pragma omp parallel { size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS , -1, 0); if (p != MAP_FAILED) memset(p, 0, size); //munmap(p, size); // uncomment to make the problem go away } } When we run it with THP enabled it will leave significant amount of memory on lru_add_pvec. This memory will be not reclaimed if we hit OOM, so when we run above program in a loop: for i in `seq 100`; do ./a.out; done many processes (95% in my case) will be killed by OOM. The primary point of the LRU add cache is to save the zone lru_lock contention with a hope that more pages will belong to the same zone and so their addition can be batched. The huge page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like a safer option when compared to a potential excess in the caching which can be quite large and much harder to fix because lru_add_drain_all is way to expensive and it is not really clear what would be a good moment to call it. Similarly we can reproduce the problem on lru_deactivate_pvec by adding: madvise(p, size, MADV_FREE); after memset. This patch flushes lru pvecs on compound page arrival making the problem less severe - after applying it kill rate of above example drops to 0%, due to reducing maximum amount of memory held on pvec from 28MB (with THP) to 56kB per CPU. Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@xxxxxxxxx Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> Cc: Ming Li <mingli199x@xxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <sasha.levin@xxxxxxxxxx> For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.18/test-amd64-i386-xl-qemut-debianhvm-amd64.debian-hvm-install.html Revision IDs in each graph node refer, respectively, to the Trees above. ---------------------------------------- Running cs-bisection-step --graph-out=/home/logs/results/bisect/linux-3.18/test-amd64-i386-xl-qemut-debianhvm-amd64.debian-hvm-install --summary-out=tmp/97669.bisection-summary --basis-template=96188 --blessings=real,real-bisect linux-3.18 test-amd64-i386-xl-qemut-debianhvm-amd64 debian-hvm-install Searching for failure / basis pass: 97592 fail [host=huxelrebe0] / 96188 [host=huxelrebe1] 96161 [host=chardonnay1] 95844 [host=baroque1] 95809 [host=pinot1] 95597 ok. Failure / basis pass flights: 97592 / 95597 (tree with no url: minios) (tree with no url: ovmf) (tree with no url: seabios) Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git Latest 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f b48be35ac86cd6369124cf06ca3006d086095297 Basis pass b5076139991c6b12c62346d9880eec1d4227d99f c530a75c1e6a472b0eb9558310b518f0dfcd8860 df553c056104e3dd8a2bd2e72539a57c4c085bae 44a072f0de0d57c95c2212bbce02888832b7b74f c2a17869d5dcd845d646bf4db122cad73596a2be Generating revisions with ./adhoc-revtuple-generator git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#b5076139991c6b12c62346d9880eec1d4227d99f-0ac0a856d986c1ab240753479f5e50fdfab82b14 git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/qemu-xen-traditional.git#df553c056104e3dd8a2bd2e72539a57c4c085bae-6e20809727261599e8527c456eb078c0e89139a1 git://xenbits.xen.org/qemu-xen.git#44a072f0de0d57c95c2212bbce02888832b7b74f-44a072f0de0d57c95c2212bbce02888832b7b74f git://xenbits.xen.org/xen.git#c2a17869d5dcd845d646bf4db122cad73596a2be-b48be35ac86cd6369124cf06ca3006d086095297 Loaded 3004 nodes in revision graph Searching for test results: 95406 [host=elbling1] 95458 [host=rimava1] 95521 [host=italia0] 95597 pass b5076139991c6b12c62346d9880eec1d4227d99f c530a75c1e6a472b0eb9558310b518f0dfcd8860 df553c056104e3dd8a2bd2e72539a57c4c085bae 44a072f0de0d57c95c2212bbce02888832b7b74f c2a17869d5dcd845d646bf4db122cad73596a2be 95809 [host=pinot1] 95844 [host=baroque1] 96161 [host=chardonnay1] 96188 [host=huxelrebe1] 97278 fail irrelevant 97289 fail irrelevant 97319 fail irrelevant 97377 fail irrelevant 97426 fail irrelevant 97533 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f b48be35ac86cd6369124cf06ca3006d086095297 97617 pass d259ae2b8a5635dd30148bf76d34c0b421791b5b c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175 97640 fail 6ded7184675a9f27e801dc7749a8ccd5d898b4e1 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97656 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97624 fail 848110b885d7003887eb599f247323e4a9ce832e c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97668 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97599 pass b5076139991c6b12c62346d9880eec1d4227d99f c530a75c1e6a472b0eb9558310b518f0dfcd8860 df553c056104e3dd8a2bd2e72539a57c4c085bae 44a072f0de0d57c95c2212bbce02888832b7b74f c2a17869d5dcd845d646bf4db122cad73596a2be 97628 fail 1e7429d49b1cef08bef8c4bd0ef42c3e14164488 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97592 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f b48be35ac86cd6369124cf06ca3006d086095297 97612 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f b48be35ac86cd6369124cf06ca3006d086095297 97632 pass 5634b6de989d03714ef8c894022c3910095bfc2b c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97659 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97645 fail f1f702e8044c1fb8791111b71b9cb2ff8b9c6e92 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97635 pass 4c2b0216cdf54e81f7c0e841b5bb1116701ae25b c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97669 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97652 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97662 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 Searching for interesting versions Result found: flight 95597 (pass), for basis pass Result found: flight 97533 (fail), for basis failure Repro found: flight 97599 (pass), for basis pass Repro found: flight 97612 (fail), for basis failure 0 revisions at 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 No revisions left to test, checking graph state. Result found: flight 97652 (pass), for last pass Result found: flight 97656 (fail), for first failure Repro found: flight 97659 (pass), for last pass Repro found: flight 97662 (fail), for first failure Repro found: flight 97668 (pass), for last pass Repro found: flight 97669 (fail), for first failure *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Bug introduced: a2d8c514753276394d68414f563591f174ef86cb Bug not present: 8f620446135b64ca6f96cf32066a76d64e79a388 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/97669/ commit a2d8c514753276394d68414f563591f174ef86cb Author: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Date: Fri Jun 24 14:50:01 2016 -0700 mm/swap.c: flush lru pvecs on compound page arrival [ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ] Currently we can have compound pages held on per cpu pagevecs, which leads to a lot of memory unavailable for reclaim when needed. In the systems with hundreads of processors it can be GBs of memory. On of the way of reproducing the problem is to not call munmap explicitly on all mapped regions (i.e. after receiving SIGTERM). After that some pages (with THP enabled also huge pages) may end up on lru_add_pvec, example below. void main() { #pragma omp parallel { size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS , -1, 0); if (p != MAP_FAILED) memset(p, 0, size); //munmap(p, size); // uncomment to make the problem go away } } When we run it with THP enabled it will leave significant amount of memory on lru_add_pvec. This memory will be not reclaimed if we hit OOM, so when we run above program in a loop: for i in `seq 100`; do ./a.out; done many processes (95% in my case) will be killed by OOM. The primary point of the LRU add cache is to save the zone lru_lock contention with a hope that more pages will belong to the same zone and so their addition can be batched. The huge page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like a safer option when compared to a potential excess in the caching which can be quite large and much harder to fix because lru_add_drain_all is way to expensive and it is not really clear what would be a good moment to call it. Similarly we can reproduce the problem on lru_deactivate_pvec by adding: madvise(p, size, MADV_FREE); after memset. This patch flushes lru pvecs on compound page arrival making the problem less severe - after applying it kill rate of above example drops to 0%, due to reducing maximum amount of memory held on pvec from 28MB (with THP) to 56kB per CPU. Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@xxxxxxxxx Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> Cc: Ming Li <mingli199x@xxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <sasha.levin@xxxxxxxxxx> pnmtopng: 63 colors found Revision graph left in /home/logs/results/bisect/linux-3.18/test-amd64-i386-xl-qemut-debianhvm-amd64.debian-hvm-install.{dot,ps,png,html,svg}. ---------------------------------------- 97669: tolerable ALL FAIL flight 97669 linux-3.18 real-bisect [real] http://logs.test-lab.xenproject.org/osstest/logs/97669/ Failures :-/ but no regressions. Tests which did not succeed, including tests which could not be run: test-amd64-i386-xl-qemut-debianhvm-amd64 9 debian-hvm-install fail baseline untested jobs: test-amd64-i386-xl-qemut-debianhvm-amd64 fail ------------------------------------------------------------ sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |