[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...



On 05/07/18 19:11, Ian Jackson wrote:
> Sander Eikelenboom writes ("Re: [Xen-devel] [Notes for xen summit 2018 design 
> session] Process changes: is the 6 monthly release Cadence too short, 
> Security Process, ..."):
>> Just wondering, are there any timing statistics kept for the OSStest
>> flights (and separate for building the various components and running
>> the individual tests ?). Or should they be parse-able from the logs kept ?
> 
> Yes.  The database has a started and stopped time_t for each test
> step.  That's where I got the ~15 mins number from.
> 
> Ian.
> 

Hi Ian,

Since the current OSStest emails give a 404 on the link to the logs,
i digged in the archives and found the right url:
    http://logs.test-lab.xenproject.org/osstest/logs/

I took the liberty to browse through some of the flights trying to get a grasp 
on how
to interpret the numbers.

Let't take an example: http://logs.test-lab.xenproject.org/osstest/logs/124946/
Started:        2018-07-03 13:08:06 Z
Finished:       2018-07-05 06:08:54 Z

That is quite some time ...

Now if i take an example job/test say: 
http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl/info.html

I see:
- step 2 hosts-allocate takes 20012 seconds
  which if i interpret it right, indicates a lot of time waiting before 
actually having a slot available to run,
  so that seems to be indicating at least a capacity problem on the infra 
structure.
- Step 3 seems to be the elapsed time while syslog recorded all the steps 
thereafter.
  It's 2639 seconds, while the rest of the steps remaining give a sum of 2630, 
so that seems about right.

  All the other steps together take 2630 seconds, so the run to wait ratio is 
about 1/7 ....
  For the remainder let's keep the waiting out of the equation, under the 
assumption that if we can reduce the rest, 
  we reduce the load on the infrastructure and reduce the waiting time as well.
 
- step 4 host-install(4) takes 1005 seconds
  It seems step 4 is the step you referred to with the 15 minutes (it's indeed 
around 15 minutes) ?
  That is around 38% percent of all the steps (excluding the waiting from step 
2) !

- step 10 debian-install which seems to be the guest install, seems modest with 
288 seconds.

I also browsed some other tests and flights and on first sight it does seem the 
give the same pattern.

So (sometimes), a lot of time is spent on waiting for a slot, followed by doing 
the host install. 

So any improvement in the later will probably reap a double benefit by also 
reducing the wait time !


When i look at job/test: 
http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl-qemuu-win10-i386/info.html

I see:
- step 2 hosts-allocate: 47116 seconds.
- step 3 syslog-server: 8191 seconds.
- step 4 host-install(4): 789 seconds, somewhat shorter than the other job/test.
- step 10 windows-install 7061 seconds, but a failing windows 10 guest install 
dwarfs them all...


When i look at job/test: 
http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl-qemuu-win7-amd64/info.html

I see:
- step 2 hosts-allocate: 13272 seconds.
- step 3 syslog-server: 2985 seconds.
- step 4 host-install(4): 675 seconds, even somewhat shorter than both the 
other job/tests.
- step 10 windows-install 1029 seconds, that's a lot better than the failing 
windows 10 install from the other job.

So running the windows install is currently a black box with a timeout of 7000 
seconds.
If it fails the total runtime of the job/test is around 8000 seconds which is 
almost 2 hours !

Which we do 4 times: 
- test-amd64-amd64-xl-qemut-win10-i386
- test-amd64-i386-xl-qemut-win10-i386
- test-amd64-amd64-xl-qemuu-win10-i386
- test-amd64-i386-xl-qemuu-win10-i386

Which all seem to result in a "10. windows-install" -> "fail never pass".
I sincerely *hope* i'm not interpreting this correct .. but are we wasting 4 * 
2 hours = 8 hours in a flight, 
on a job/test that has *never ever* passed (and probably will never, miracles 
or a specific bugfix excluded) ?

Would it be an idea to only test "fail never pass" on install steps only every 
once in a while (they can't be blockers anyway ?)
if at all (only re-enable manually after fix?). If my interpretation is right 
this seems to be quite low hanging fruit.

--
Sander

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.