Xen project Mailing List

Re: [Wg-test-framework] osstest Massachusetts test lab resource usage investigation

To: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <wg-test-framework@xxxxxxxxxxxxxxxxxxxx>

From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

Date: Mon, 26 Sep 2016 17:42:01 +0100

Delivery-date: Mon, 26 Sep 2016 16:42:17 +0000

List-id: Xen Project Test Framework Working Group <wg-test-framework.lists.xenproject.org>

Ian Jackson writes ("Re: [Wg-test-framework] osstest Massachusetts test lab resource usage investigation"): > I have now completed the investigation into db queries. I've been looking at the bisector too. I captured a history snapshot containing two bisections chosen arbitrarily, and wrote some scripts which I used to analyse the snapshot. Xen 4.5 build failure around 20th of September. The bisector completed bisections of both the amd64 and i386 build failures, in that order. Here is the top time spent for amd64: 68.69% 41208 flight - step start build hosts-allocate | flight - step finish build 10.96% 6577 flight - step start build host-install(3) | flight - step finish build 10.53% 6316 flight - step start build .*-build | flight - step finish build 5.42% 3249 flight - step start build host-(?:install|build-prep) | flight - step finish build 0.76% 453 email - testing | flight - job start And for i386: 49.22% 8976 flight - step start build hosts-allocate | flight - step finish build 19.51% 3557 flight - step start build .*-build | flight - step finish build 13.10% 2389 flight - step start build host-install(3) | flight - step finish build 5.49% 1002 flight - step start build host-(?:install|build-prep) | flight - step finish build 2.24% 409 flight - flight ending | mtime - transcript Overall it spent 50-70% of the elapsed time waiting for a slot on a build machine, and then 16-20% of the elapsed time reinstalling the build machine. I think this could be improved by providing one or more hosts which were dedicated to building. They would not need reinstalling so often, and would often be idle. Looking at the logs there is a particularly long delay (15ks, 4h12) before the first repro job completes. I think this is probably because each bisection job runs with the start time priority of the first one, so that the first job is delayed by (roughly) the queue length. This is done deliberately to avoid trying to bisect things which are fixed quickly. Given the small proportion of our resources being used for bisections we may want to reconsider that. Xen 4.5 guest migration failure around 22nd September. (test-amd64-amd64-xl-qemuu-winxpsp3, step guest-localmigrate/x10. The bisector ran first for a different failure, test-amd64-amd64-xl-qemuu-ovmf-amd64 and determined that that one was unreproducible. I have analysed data only up to Thu, 22 Sep 2016 07:59:02 GMT (since that was in my collection snapshot). Counting the whole period from the failure of the main flight to the end of the snapshot recording, we have the following elapsed times: 35.05% 23613 flight - step start build hosts-allocate | flight - step finish build 18.85% 12698 flight - step start test hosts-allocate | flight - step finish test 14.79% 9965 flight - step start test windows-install | flight - step finish test These figures will disproportionately bias the initial startup host allocation delay (see above), since this is not a finished bisection. Counting only the period after the ovmf bisection was abandoned, 37.11% 9965 flight - step start test windows-install | flight - step finish test 10.97% 2946 flight - step start test hosts-allocate | flight - step finish test 8.13% 2183 flight - step start test host-install(3) | flight - step finish test 7.05% 1892 flight - step start test (?!capture|host-install|hosts-allocate|windows-install).* | flight - step finish test 6.32% 1697 flight - step start build .*-build | flight - step finish build 6.15% 1650 flight - step start build host-install(3) | flight - step finish build 5.47% 1470 mtime - bisection-report | mtime - transcript 4.54% 1220 crlog - begin | email - testing 3.42% 918 flight - step start build host-(?:install|build-prep) | flight - step finish build 1.75% 470 flight - job finish | flight - step finish build Looking at the logs each iteration takes about 1 hour. This bisection involves a much longer iteration for each step, because the test involves a Windows install. So the host allocation delay here is a much smaller proportion, even though the bisector needs to get exactly the right host. 11% is not that much here, but a faster test would make this look worse. I have a half-baked idea to allow an in-progress bisection to reserve its test host. I think that this would be worth pursuing, although there's a fair amount of complication involved. 38% of the wall clock time was spent doing a Windows install. The test does a fresh install each time, rather than saving a VM image and reusing it. In principle it might be possible to use saved VM images. We do fresh installs because installs are a good exercise of a variety of functionality, and because that avoids having to maintain and comprehend a library of VMs. In particular: if we were maintaining a library of VMs they would have to be updated occasionally (when), and problems which arose due to changes in the VM library would be obscure. I don't think changes in this area are particularly easy. Windows installs are a pathological case because they are so slow. Most guest installs done by osstest are much faster. And when there seem to be multiple regressions, osstest choses to work first on the one whose test is shortest - hence picking an Debian install on OVMF first, here. About 15% of the time (depending how you count it) seems to be going on bookkeeping of one kind or another, including the archaelogy necessary to decide what the next revision to test is. In a faster test this would be quite a large proportion of the elapsed time. My other reports, particularly the one one on database transaction performance, contain some suggestions on how this might be improved. In general I think the database concurrency issues I discussed in my email Subject: Re: [Wg-test-framework] osstest Massachusetts test lab resource usage investigation Date: Tue, 30 Aug 2016 11:53:18 +0100 will help with this. I expect many developers will think that osstest's bisector is spending far too much time on setup, during each bisection iteration. It's always wiping a host, reinstalling with the relevant Xen, and (if the failure is not before then) reinstalling the guest OS. But of course a human is likely to be able to tell whether a particular issue could have been the result of (for example) corruption which occurred during the install phase. It's also possible for bugs to even cause disk corruption on the host. To avoid giving wrong answers it seems best to me for the osstest bisector to use a strategy which is somewhat slower but which is sure not to be misled. I will send my scripts as a followup to this email. Ian. _______________________________________________ Wg-test-framework mailing list Wg-test-framework@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.