[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [OSSTEST PATCH 28/28] Executive: Delay releasing build host shares by 90s
When a build job finishes, the same flight may well want to do a subsequent build that depended on the first. When this happens, we have a race: One the one hand, we have the flight: after sg-run-job exits, sg-execute-flight needs to double-check the job status, and search the flight for more jobs to run; it will spawn ts-allocate-hosts-Executive for the new job, which needs to get its head together, parse its arguments, become a client of the queue daemon, and ask to be put in the queue. On the other hand, we have the planning system: currently, as soon as sg-run-job exits, the connection to the ownerdaemon closes. The ownerdaemon tells the queue daemon, and the planning queue is restarted. It might even happen that coincidentally the planning queue is about to start. If the planning system wins the race, another job will pick up the newly-freed resource. Often this will mean unsharing the build host, which is very wasteful if the releasing flight hasn't finished its builds for that architecture: it means that the next build job needs to regroove a host for builds. Add a bodge to try to make the race go the other way: after a build job completes successfuly, do not give up the share for a further 90 seconds. (We have to use setsid because sg-execute-flight kills the process group to clean up stray processes, which this sleep definitely is.) A better solution would be to move the wait-for-referenced-job logic from sg-execute-flight to ts-hosts-allocate-*. But that would be much more complicated. Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> --- v4: New patch --- sg-run-job | 2 ++ tcl/JobDB-Executive.tcl | 6 ++++++ tcl/JobDB-Standalone.tcl | 1 + 3 files changed, 9 insertions(+) diff --git a/sg-run-job b/sg-run-job index c51a508..66145b8 100755 --- a/sg-run-job +++ b/sg-run-job @@ -71,6 +71,8 @@ proc run-job {job} { if {$ok} { setstatus pass } + if {$need_build_host && $ok} { jobdb::preserve-task 90 } + if {$anyfailed} { jobdb::logputs stdout "at least one test failed" } diff --git a/tcl/JobDB-Executive.tcl b/tcl/JobDB-Executive.tcl index d61d2a2..f37bbaf 100644 --- a/tcl/JobDB-Executive.tcl +++ b/tcl/JobDB-Executive.tcl @@ -280,6 +280,12 @@ proc become-task {comment} { } } +proc preserve-task {seconds} { + # This keeps the owner daemon connection open: our `sleep' + # will continue to own our resources for $seconds longer + exec setsid sleep $seconds > /dev/null < /dev/null 2> /dev/null & +} + proc step-log-filename {flight job stepno ts} { global c set logdir $c(Logs)/$flight/$job diff --git a/tcl/JobDB-Standalone.tcl b/tcl/JobDB-Standalone.tcl index a2b8dd9..d7d8422 100644 --- a/tcl/JobDB-Standalone.tcl +++ b/tcl/JobDB-Standalone.tcl @@ -74,6 +74,7 @@ proc step-set-status {flight job stepno st} { } proc become-task {argv} { } +proc preserve-task {argv} { } proc step-log-filename {flight job stepno ts} { return {} -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |