[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [OSSTEST PATCH] README.planner: Document the resource planning system



Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
---
 README.planner |  181 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 180 insertions(+), 1 deletion(-)

diff --git a/README.planner b/README.planner
index de8b962..ec4dce8 100644
--- a/README.planner
+++ b/README.planner
@@ -1,4 +1,183 @@
-Resource planner / scheduler
+RESOURCE PLANNER (IE SCHEDULER)
+===============================
+
+Overall architecture
+--------------------
+
+Resources (eg hosts) are owned by `tasks'.  As resources are allocated
+and deallocated, their `owntaskid' in the database is updated.
+
+When a process wishes to allocate resources, it does as follows:
+
+ - Select an appropriate task.  For command-line use, the user@host
+   static task usually used (as specified by the OSSTEST_TASK env var)
+   and things fail if it doesn't actually exist.
+
+   Automatic runs create a new ownd task for each job (in become-task
+   in JobDB-Executive.tcl, in sg-run-job.
+
+ - Connect to the queue daemon and participate in the planning
+   process.
+
+
+Planning
+--------
+
+The queue daemon sequences the planning of resource use and the
+allocation of resources.  This is done in a periodic planning cycle.
+Planning cycles are prompted by newly available resources, new
+requests for participation, and periodically.
+
+During each planning cycle we construct, from scratch, a complete plan
+for which resources are to be used, when, by which tasks.  Resources
+which are free and suitable for allocation right away are planned and
+allocated for immediate use.
+
+But, the plan extends far enough into the future to cover all
+currently-foreseeable requirements for resources.  This provides the
+planning algorithms the most complete information about available
+tradeoffs, and also provides useful output (the resource plan) for
+administrators and users.
+
+Each planning cycle starts with the existing allocated resources.  The
+planning daemon records (on disk, not in the database) what expected
+duration was declared with each of those allocations.  (A task that
+has allocated the resources it needs does not any longer participate
+in the planning process, although it will retain a liveness connection
+to the ms-ownerdaemon.)
+
+Then each interested client of ms-queuedaemon is asked - one by one,
+in turn - to fill into the plan-under-construction, what resources it
+intends to uses when.  Clients specify the expected duration of their
+use (but there is no mechanism for enforcing accuracy of these
+estimates).  ms-queuedaemon collates and records the provided
+information and passes it on to the next client.
+
+If there are resources which are available right now which a client
+wants to use, the client will allocate it there and then during its
+planning slot.
+
+The queueing order is determined by the job priority value.  Each
+client declares its own priority.  The usual basis for the priority is
+is client's starting time_t.  So by and large jobs execute in order.
+
+The main client in the planning process is
+ts-hosts-allocate-Executive.  That program contains the heuristics for
+choosing good tests hosts under various conditions.
+
+Command-line users can use mg-allocate -U to obtain resources through
+the planning process.  mg-allocate participates with a high queue
+priority so that command-line allocations will take precedence over
+automatic test runs.  (mg-allocate without -U bypasses the planner and
+can be used to `grab' resources which happen currently to be free.)
+
+The distinction between `idle' and `allocatable' resources exists so
+that newly-freed resources are properly offered first to the tasks at
+the front of the queue.  ms-ownerdaemon sets all idle resources to
+allocatable at the start of each planning cycle.
+
+
+ms-ownerdaemon and `ownd' tasks
+-------------------------------
+
+ms-ownerdaemon helps with cleanup and does nothing else.  Test runs
+connect to it and obtain ephemeral `task' ids.  All of the processes
+which are part of the the test run retain a descriptor onto the
+socket connection to ms-ownerdaemon.  When the last holder of a copy
+of the socket connection fd dies, ms-ownerdaemon sees the connection
+close.  It then sets the task to `not live' in the database.
+
+This means that there is no need for any explicit cleanup: tasks
+which just crash have their resources freed automatically.
+
+If the ms-ownerdaemon fails and is restarted, the tasks which were
+clients of the previous ms-owerdaemon cannot be automatically cleaned
+up.  The new ms-ownerdaemon will annotate them with `previous'.  The
+administrator can then clean them up manually, if she knows that all
+the corresponding actual processes are no longer running.
+
+
+Types of task
+-------------
+
+ * static tasks.  Usual for command-line use.  They are manually
+   created (with ./mg-hosts manual-task-create) and not normally ever
+   destroyed.
+
+ * `ownd' tasks.  These are used for production runs from cron and
+   some other mostly-automatic invocations of osstest (eg
+   mg-execute-flight).  They are automatically created and destroyed -
+   see above.
+
+ * magic task numbers with special meanings:
+
+     magic/allocatable
+
+        The resource is free and a process which is participating in
+        the planning process may allocate it to themselves by updating
+        the `owntaskid' in the resources table to refer to their own
+        task.
+
+     magic/idle
+
+        The resource is free but has perhaps only recently become so.
+        It can be allocated outside the planning process, but proceses
+        participating in planning should regard the resource as
+        unavailable.
+
+     magic/shared
+
+        The resource has been divided into shares.  It is unavailable
+        in its own right without being unshared first.  The individual
+        shares have their own owners.
+
+     magic/preparing
+
+        Applies only to shares of a divided resource.  The share is
+        unavailable because the process handling the division is still
+        putting the resource into the proper state implied by the
+        sharing information (see below).
+
+
+Sharing
+-------
+
+Hosts can be shared between multiple clients.  The first client to
+decide to set up a host for sharing:
+
+ - `Divides' the resource in the database
+    * allocates the host to the taskid `shared' and creates a set
+      of new rows in the resources table to represent the shares
+      (the number of shares is fixed at this point)
+    * initially, sets all but one of those shares to be owned by
+      magic/preparing
+    * sets the remaining share to be owned by itself
+ - Performs whatever actions are necessary to get the host into
+   a suitable state for it and others to use it (eg, installing
+   the OS)
+ - Sets the remaining shares to `idle' so that others can allocate
+   them
+
+(During planning - ie, for resources not yet available immediately -
+the intent to do this can be part of the plan so that other tasks can
+see and take account of it.  The time necessary for preparing the host
+is not currently modelled during planning.)
+
+Likewise a process which finds a shared resource completely idle can
+unshare it.  That is:
+    * Check that all the shares are allocatable
+    * Delete all the rows representing the shares
+    * Claim ownership of the main resource by changing the owntaskid
+      from `shared' to the process's own task.
+
+Shared resources also have a `wear' counter, which is there to arrange
+that shared systems get regrooved occasionally even if nothing decides
+to unshare them.
+
+
+
+DETAILED PROTOCOL NOTES
+=======================
 
 ms-queuedaemon commands
 
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.