[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight



Sometimes we find ourselves seriously lacking the capacity to run
particular job(s).  The result can be that the whole system stands
mostly idle while a small proportion of the resources runs flat out
with a giant queue.

In this series we arrange for osstest to be able to spot this
happening, and automatically rebalance load by give up earlier on the
jobs which are overly-contended.

There are some tuning parameters, of course.  To summarise, I have
chosen here to treat jobs as starved if (for example):
  We have completed 90% of the flight, and the remaining 10%
  is projected to take 5x as long as the first 90%.
(The "90%" is by number of jobs.)  See the patch
  starvation: Infrastructure for jobs which are delaying their flights
for details of the heuristic and its parameters.

When situations like this persist it will still be good to manually
balance the load by adjusting the job mix in submitted flights.  This
is because the starvation will not necessarily drop the same job in
subsequent flights on the same "branch", so starvation will impair the
regression detection.

Ian Jackson (21):
  ts-hosts-allocate-Executive: with -U, just append to the same logfile
  selecthost: Honour new $none_ok optional parameter
  ts-logs-capture: Do not try to capture logs of hosts not allocated
  alloc_resources: Support special abandonment values
  starvation: Teach sg-report-flight about starved step state
  starvation: Teach archaeologists about starved job state
  starvation: Teach ms-flights-summary about job state starved
  starvation: Teach sg-execute-flight about job state starved
  step handling: Preserve step states set by ts-* scripts
  TestSupport: Make "broken" print the actual job state
  JobDB::Executive: step_*: fix log messages to talk about "steps"
  starvation: Permit step_finish to set the state `starved'
  TestSupport: Make "broken" set the step state too
  tcl/JobDB-Executive: Do not squash "starved" status
  starvation: Propagate starved job status into dependent jobs
  ts-host-allocate-Executive: Break out $now and add a newline
  starvation: Use "starved" for hostalloc_maxwait_max
  starvation: Infrastructure for jobs which are delaying their flights
  starvation: Abandon jobs which are unreasonably delaying their flight
  hostalloc_maxwait_max: Use starvation most_optimistic
  starvation: Better logging/debugging output

 Osstest/Executive.pm         |  95 ++++++++++++++++++++++++++---
 Osstest/JobDB/Executive.pm   |   8 ++-
 Osstest/TestSupport.pm       |  24 ++++++--
 mg-hostalloc-starvation-demo |  53 ++++++++++++++++
 ms-flights-summary           |   9 +--
 sg-execute-flight            |   2 +-
 sg-report-flight             |  17 +++++-
 tcl/JobDB-Executive.tcl      |   6 +-
 ts-hosts-allocate-Executive  | 142 ++++++++++++++++++++++++++++++++++++++++---
 ts-logs-capture              |   7 ++-
 10 files changed, 328 insertions(+), 35 deletions(-)
 create mode 100755 mg-hostalloc-starvation-demo

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.