[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight
Sometimes we find ourselves seriously lacking the capacity to run particular job(s). The result can be that the whole system stands mostly idle while a small proportion of the resources runs flat out with a giant queue. In this series we arrange for osstest to be able to spot this happening, and automatically rebalance load by give up earlier on the jobs which are overly-contended. There are some tuning parameters, of course. To summarise, I have chosen here to treat jobs as starved if (for example): We have completed 90% of the flight, and the remaining 10% is projected to take 5x as long as the first 90%. (The "90%" is by number of jobs.) See the patch starvation: Infrastructure for jobs which are delaying their flights for details of the heuristic and its parameters. When situations like this persist it will still be good to manually balance the load by adjusting the job mix in submitted flights. This is because the starvation will not necessarily drop the same job in subsequent flights on the same "branch", so starvation will impair the regression detection. Ian Jackson (21): ts-hosts-allocate-Executive: with -U, just append to the same logfile selecthost: Honour new $none_ok optional parameter ts-logs-capture: Do not try to capture logs of hosts not allocated alloc_resources: Support special abandonment values starvation: Teach sg-report-flight about starved step state starvation: Teach archaeologists about starved job state starvation: Teach ms-flights-summary about job state starved starvation: Teach sg-execute-flight about job state starved step handling: Preserve step states set by ts-* scripts TestSupport: Make "broken" print the actual job state JobDB::Executive: step_*: fix log messages to talk about "steps" starvation: Permit step_finish to set the state `starved' TestSupport: Make "broken" set the step state too tcl/JobDB-Executive: Do not squash "starved" status starvation: Propagate starved job status into dependent jobs ts-host-allocate-Executive: Break out $now and add a newline starvation: Use "starved" for hostalloc_maxwait_max starvation: Infrastructure for jobs which are delaying their flights starvation: Abandon jobs which are unreasonably delaying their flight hostalloc_maxwait_max: Use starvation most_optimistic starvation: Better logging/debugging output Osstest/Executive.pm | 95 ++++++++++++++++++++++++++--- Osstest/JobDB/Executive.pm | 8 ++- Osstest/TestSupport.pm | 24 ++++++-- mg-hostalloc-starvation-demo | 53 ++++++++++++++++ ms-flights-summary | 9 +-- sg-execute-flight | 2 +- sg-report-flight | 17 +++++- tcl/JobDB-Executive.tcl | 6 +- ts-hosts-allocate-Executive | 142 ++++++++++++++++++++++++++++++++++++++++--- ts-logs-capture | 7 ++- 10 files changed, 328 insertions(+), 35 deletions(-) create mode 100755 mg-hostalloc-starvation-demo -- 2.11.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |