[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v2] docs/designs: PV stats interface



This patch introduces a design proposal for an interface to be used by
guest PV drivers or agents to convey statistics to a monitoring agent
running in a toolstack domain.

Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
---
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

v2:
- Protocol changes made during PoC implementation
- Made event-channel optional to allow for 'free running' sets (suggested
  by Andrew).
- Addressed some comments from Jan
---
 docs/designs/pv_stats.markdown | 134 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100755 docs/designs/pv_stats.markdown

diff --git a/docs/designs/pv_stats.markdown b/docs/designs/pv_stats.markdown
new file mode 100755
index 0000000..b7961cb
--- /dev/null
+++ b/docs/designs/pv_stats.markdown
@@ -0,0 +1,134 @@
+# Statistics Interface for PV Drivers/Agents
+
+## Background
+
+It is common for guest PV drivers or agents to communicate statistics to an
+agent running in a toolstack domain so that these can be displayed via a UI,
+or even influence guest placement etc. The mechanism for conveying these
+statistics is currently ad-hoc, undocumented, and usually based on xenstore.
+
+Whilst xenstore does indeed provide a convenient mechanism, the lack of
+documentation and standardisation of the protocols used creates
+compatibility issues for PV drivers or agents not tied to a specific
+product or environment. Also, the guest xenstore quota and the single
+instance of xenstored can easily become scalability issues.
+
+## Proposal
+
+The proposed interface is intended to be used only for the purposes of
+conveying statistics from guest PV drivers or agents to an agent or agents
+running in a toolstack domain. It is not intended for bulk data transfer,
+nor as another means of control of the PV drivers or agents by the
+toolstack domain. It is also assumed that software running in the toolstack
+domain will ensure that only a single instance of an agent will attempt to
+monitor a set of statistics; the protocol is not intended to handle
+multiple monitoring agents.
+
+PV drivers or agents typically publish multiple related sets of statistics.
+For example, a PV network frontend may publish statistics relating to
+received traffic and transmitted traffic. These sets are likely to be
+updated asynchronously from each other and therefore it makes sense that
+they can be separated such that a monitoring agent can refresh its view of
+them asynchronously. It is therefore proposed that a two-level hierarchy of
+xenstore keys is used to advertise sets of guest statistics.
+
+The toolstack will create a writeable top-level `stats` key in the guest
+space. Under this each guest statistics *provider* creates a key its name,
+e.g. `vif0`. This key acts as a parent to keys that then name each
+set of statistics that it provides, e.g. `tx`.
+
+If an agent running in the toolstack domain wishes to monitor the set of
+statistics, it must write its domid under the key e.g.:
+
+    monitor-id = 0
+
+The provider should set up a watch so that it is notified when this key
+is written. It then responds by writing keys containing grant references
+of pages containing the names and values of the statistics in that set, and
+an optional event channel to be used for signalling e.g.:
+
+    name-type-ref0 = 10
+    name-type-ref1 = 11
+    val-ref0 = 12
+    val-ref1 = 13
+    event-channel = 10
+
+Finally the provider should write:
+
+    ready = 1
+
+The provider must always write the `ready` key after all the other
+keys have been written such that a monitoring agent can know when it is
+safe to sample the other keys.
+
+There are separate references for pages containing the names and types of
+the statistics and the values of those statistics since it is required that
+the names and types do not change and hence a monitoring agent need only
+sample them once and can do so as soon as the as the `name-type-ref` keys
+are valid. The format of each (*name, type*) tuple is as follows:
+
+    struct stats_name_type {
+       uint8_t type;
+       char name[63];
+    };
+
+and the possible types are:
+
+    #define STATS_TYPE_INVALID 0
+    #define STATS_TYPE_S64             1
+    #define STATS_TYPE_U64             2
+    #define STATS_TYPE_DOUBLE  3
+    #define STATS_TYPE_ASCII   4
+
+The `name` must be a NUL terminated ASCII string containing only
+alphanumeric characters, printable non-alphanumeric characters or a space
+character, i.e. the C expression:
+
+    (isalnum(name[i]) || ispunct(name[i]) || name[i] == ' ')
+
+must be true for each value of `i` until `name[i] == '\0'` (assuming the 'C'
+locale).
+
+When iterating through the `stats_name_type` structures a monitoring agent
+can determine that it has finished when it either encounters a `type` value
+of `STATS_TYPE_INVALID`, or it has iterated through all 64 structures in the
+ granted page and there are no further `name-type-ref` keys.
+
+A monitoring agent can find the value of a statistic by noting the
+`name-type-ref` index and the offset into the page where the
+`stats_name_type` was found and the looking at the same offset in the
+corresponding `val-ref` page. Values are therefore also 64 octets
+in length and contain:
+
+* `STATS_TYPE_S64` : A signed 64-bit integer in little endian form in
+                    octets 0..7
+* `STATS_TYPE_U64` : An unsigned 64-bit integer in little endian form in
+                    octets 0..7
+* `STATS_TYPE_DOUBLE` : A double precision floating point value in little
+                       endian form in
+                       octets 0..7
+* `STATS_TYPE_ASCII` : A NUL terminated ASCII string meeting the same
+                      criteria as `name`
+
+If the statistics provider wrote an `event-channel` key then it should
+not update any of the statistics in a set until the `event-channel` is
+signalled indicating that a monitoring agent wishes to sample them.
+When such a signal is received, all statistics in the set should be
+updated in a consistent manner and a signal sent back to the monitoring
+agent via the `event-channel` to say that the update is complete. The
+monitoring agent can then sample the whole set.
+
+If the statistics provider did not write an `event-channel` key then the
+statstics are considered *free running* i.e. there is no expectation of
+consistency between the values and a monitoring agent is at liberty to
+sample an arbitrary subset at any time.
+
+When a provider wishes to withdraw a set of statistics, e.g. when it is
+shutting down, it notifies a monitoring agent by removing the `ready` key
+from xenstore. Thus a monitoring agent must maintain a watch on that key
+and respond in a timely manner by closing any event channel and releasing
+all grant references (which it may have mapped).
+Once the agent has done this, it must remove its `monitor-id` key so that
+the provider can close any event channel, revoke all grants and remove the
+remaining keys corresponding to the set. When the provider is no longer
+advertising any sets it can then remove its top-level key.
-- 
2.5.3


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.