[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [OSSTEST PATCH 17/33] ms-ownerdaemon: Cope with db restart. Retry recording dead tasks.
In chan-destroy-stuff, instead of accessing the db directly, add the dead task(s) to a queue, and arrange to look at that queue. Errors are handled by setting an `after' handler which we cancel if we are successful. The after handler requeues a queue run attempt as the first thing (which will arrange that a further retry will occur if things are still broken) and then attempts to reconnect to the database. I have tested this with a test instance by renaming the `tasks' table under its feet, and it functions as expected. DEPLOYMENT NOTE: The owner daemon cannot be restarted without shutting everything down. So this update should first be deployed in Cambridge, probably, to see how it goes. Also, it is less critical in the main Xen production test lab because there the db and the owner daemon are co-hosted on the same VM. Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> Acked-by: Ian Campbell <ian.campbell@xxxxxxxxxx> --- v2: Put back the `unset tasks' which was mistakenly removed. The effect of its lack is to fail to clear out the task list for previous uses of the channel (which is named after the fd); this is mostly harmless apart from log spam but causes the usual case to be something like OK created-task 456354 ownd [10.80.227.94]:44852-876 rather than OK created-task 456354 ownd [10.80.227.94]:44852-876 which some of the clients (rightly) don't expect. --- Osstest/Executive.pm | 1 + ms-ownerdaemon | 38 ++++++++++++++++++++++++++++++++++---- 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index 468031c..0602925 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -113,6 +113,7 @@ augmentconfigdefaults( augmentconfigdefaults( OwnerDaemonHost => $c{ControlDaemonHost}, QueueDaemonHost => $c{ControlDaemonHost}, + OwnerDaemonDbRetry => $c{QueueDaemonRetry}, ); #---------- configuration reader etc. ---------- diff --git a/ms-ownerdaemon b/ms-ownerdaemon index 3623d19..62ca645 100755 --- a/ms-ownerdaemon +++ b/ms-ownerdaemon @@ -22,16 +22,38 @@ source ./tcl/daemonlib.tcl +set dead_tasks {} + proc chan-destroy-stuff {chan} { + global dead_tasks + upvar #0 chanawait($chan) await catch { unset await } upvar #0 chantasks($chan) tasks if {![info exists tasks]} return + puts-chan-desc $chan "-- $tasks" + + foreach task $tasks { + lappend dead_tasks $task + } + unset tasks + after idle record-dead-tasks +} + +proc record-dead-tasks {} { + global c dead_tasks + + if {![llength $dead_tasks]} return + + puts "record-dead-tasks ... $dead_tasks" + + set retry [expr {$c(OwnerDaemonDbRetry) * 1000}] + set eafter [after $retry record-dead-tasks-retry] + jobdb::transaction resources { - puts-chan-desc $chan "-- $tasks" - foreach task $tasks { + foreach task $dead_tasks { jobdb::db-execute " UPDATE tasks SET live = 'f' @@ -39,12 +61,20 @@ proc chan-destroy-stuff {chan} { " } } - puts-chan-desc $chan "== $tasks" - unset tasks + after cancel $eafter + puts "record-dead-tasks OK. $dead_tasks" + set dead_tasks {} after idle await-endings-notify } +proc record-dead-tasks-retry {} { + after idle record-dead-tasks + puts "** reconnecting/retrying **" + catch { jobdb::db-close } + jobdb::db-open +} + proc await-endings-notify {} { global chanawait foreach chan [array names chanawait] { -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |