[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [OSSTEST PATCH 28/28] Executive: Delay releasing build host shares by 90s



When a build job finishes, the same flight may well want to do a
subsequent build that depended on the first.  When this happens, we
have a race:

One the one hand, we have the flight: after sg-run-job exits,
sg-execute-flight needs to double-check the job status, and search the
flight for more jobs to run; it will spawn ts-allocate-hosts-Executive
for the new job, which needs to get its head together, parse its
arguments, become a client of the queue daemon, and ask to be put in
the queue.

On the other hand, we have the planning system: currently, as soon as
sg-run-job exits, the connection to the ownerdaemon closes.  The
ownerdaemon tells the queue daemon, and the planning queue is
restarted.  It might even happen that coincidentally the planning
queue is about to start.

If the planning system wins the race, another job will pick up the
newly-freed resource.  Often this will mean unsharing the build host,
which is very wasteful if the releasing flight hasn't finished its
builds for that architecture: it means that the next build job needs
to regroove a host for builds.

Add a bodge to try to make the race go the other way: after a build
job completes successfuly, do not give up the share for a further 90
seconds.  (We have to use setsid because sg-execute-flight kills the
process group to clean up stray processes, which this sleep definitely
is.)

A better solution would be to move the wait-for-referenced-job logic
from sg-execute-flight to ts-hosts-allocate-*.  But that would be much
more complicated.

Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
---
v4: New patch
---
 sg-run-job               |    2 ++
 tcl/JobDB-Executive.tcl  |    6 ++++++
 tcl/JobDB-Standalone.tcl |    1 +
 3 files changed, 9 insertions(+)

diff --git a/sg-run-job b/sg-run-job
index c51a508..66145b8 100755
--- a/sg-run-job
+++ b/sg-run-job
@@ -71,6 +71,8 @@ proc run-job {job} {
 
     if {$ok} { setstatus pass                                             }
 
+    if {$need_build_host && $ok} { jobdb::preserve-task 90 }
+
     if {$anyfailed} {
         jobdb::logputs stdout "at least one test failed"
     }
diff --git a/tcl/JobDB-Executive.tcl b/tcl/JobDB-Executive.tcl
index d61d2a2..f37bbaf 100644
--- a/tcl/JobDB-Executive.tcl
+++ b/tcl/JobDB-Executive.tcl
@@ -280,6 +280,12 @@ proc become-task {comment} {
     }
 }
 
+proc preserve-task {seconds} {
+    # This keeps the owner daemon connection open: our `sleep'
+    # will continue to own our resources for $seconds longer
+    exec setsid sleep $seconds > /dev/null < /dev/null 2> /dev/null &
+}
+
 proc step-log-filename {flight job stepno ts} {
     global c
     set logdir $c(Logs)/$flight/$job
diff --git a/tcl/JobDB-Standalone.tcl b/tcl/JobDB-Standalone.tcl
index a2b8dd9..d7d8422 100644
--- a/tcl/JobDB-Standalone.tcl
+++ b/tcl/JobDB-Standalone.tcl
@@ -74,6 +74,7 @@ proc step-set-status {flight job stepno st} {
 }
 
 proc become-task {argv} { }
+proc preserve-task {argv} { }
 
 proc step-log-filename {flight job stepno ts} {
     return {}
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.