[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v2] docs/designs: PV stats interface
This patch introduces a design proposal for an interface to be used by guest PV drivers or agents to convey statistics to a monitoring agent running in a toolstack domain. Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx> --- Cc: Jan Beulich <jbeulich@xxxxxxxx> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> v2: - Protocol changes made during PoC implementation - Made event-channel optional to allow for 'free running' sets (suggested by Andrew). - Addressed some comments from Jan --- docs/designs/pv_stats.markdown | 134 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100755 docs/designs/pv_stats.markdown diff --git a/docs/designs/pv_stats.markdown b/docs/designs/pv_stats.markdown new file mode 100755 index 0000000..b7961cb --- /dev/null +++ b/docs/designs/pv_stats.markdown @@ -0,0 +1,134 @@ +# Statistics Interface for PV Drivers/Agents + +## Background + +It is common for guest PV drivers or agents to communicate statistics to an +agent running in a toolstack domain so that these can be displayed via a UI, +or even influence guest placement etc. The mechanism for conveying these +statistics is currently ad-hoc, undocumented, and usually based on xenstore. + +Whilst xenstore does indeed provide a convenient mechanism, the lack of +documentation and standardisation of the protocols used creates +compatibility issues for PV drivers or agents not tied to a specific +product or environment. Also, the guest xenstore quota and the single +instance of xenstored can easily become scalability issues. + +## Proposal + +The proposed interface is intended to be used only for the purposes of +conveying statistics from guest PV drivers or agents to an agent or agents +running in a toolstack domain. It is not intended for bulk data transfer, +nor as another means of control of the PV drivers or agents by the +toolstack domain. It is also assumed that software running in the toolstack +domain will ensure that only a single instance of an agent will attempt to +monitor a set of statistics; the protocol is not intended to handle +multiple monitoring agents. + +PV drivers or agents typically publish multiple related sets of statistics. +For example, a PV network frontend may publish statistics relating to +received traffic and transmitted traffic. These sets are likely to be +updated asynchronously from each other and therefore it makes sense that +they can be separated such that a monitoring agent can refresh its view of +them asynchronously. It is therefore proposed that a two-level hierarchy of +xenstore keys is used to advertise sets of guest statistics. + +The toolstack will create a writeable top-level `stats` key in the guest +space. Under this each guest statistics *provider* creates a key its name, +e.g. `vif0`. This key acts as a parent to keys that then name each +set of statistics that it provides, e.g. `tx`. + +If an agent running in the toolstack domain wishes to monitor the set of +statistics, it must write its domid under the key e.g.: + + monitor-id = 0 + +The provider should set up a watch so that it is notified when this key +is written. It then responds by writing keys containing grant references +of pages containing the names and values of the statistics in that set, and +an optional event channel to be used for signalling e.g.: + + name-type-ref0 = 10 + name-type-ref1 = 11 + val-ref0 = 12 + val-ref1 = 13 + event-channel = 10 + +Finally the provider should write: + + ready = 1 + +The provider must always write the `ready` key after all the other +keys have been written such that a monitoring agent can know when it is +safe to sample the other keys. + +There are separate references for pages containing the names and types of +the statistics and the values of those statistics since it is required that +the names and types do not change and hence a monitoring agent need only +sample them once and can do so as soon as the as the `name-type-ref` keys +are valid. The format of each (*name, type*) tuple is as follows: + + struct stats_name_type { + uint8_t type; + char name[63]; + }; + +and the possible types are: + + #define STATS_TYPE_INVALID 0 + #define STATS_TYPE_S64 1 + #define STATS_TYPE_U64 2 + #define STATS_TYPE_DOUBLE 3 + #define STATS_TYPE_ASCII 4 + +The `name` must be a NUL terminated ASCII string containing only +alphanumeric characters, printable non-alphanumeric characters or a space +character, i.e. the C expression: + + (isalnum(name[i]) || ispunct(name[i]) || name[i] == ' ') + +must be true for each value of `i` until `name[i] == '\0'` (assuming the 'C' +locale). + +When iterating through the `stats_name_type` structures a monitoring agent +can determine that it has finished when it either encounters a `type` value +of `STATS_TYPE_INVALID`, or it has iterated through all 64 structures in the + granted page and there are no further `name-type-ref` keys. + +A monitoring agent can find the value of a statistic by noting the +`name-type-ref` index and the offset into the page where the +`stats_name_type` was found and the looking at the same offset in the +corresponding `val-ref` page. Values are therefore also 64 octets +in length and contain: + +* `STATS_TYPE_S64` : A signed 64-bit integer in little endian form in + octets 0..7 +* `STATS_TYPE_U64` : An unsigned 64-bit integer in little endian form in + octets 0..7 +* `STATS_TYPE_DOUBLE` : A double precision floating point value in little + endian form in + octets 0..7 +* `STATS_TYPE_ASCII` : A NUL terminated ASCII string meeting the same + criteria as `name` + +If the statistics provider wrote an `event-channel` key then it should +not update any of the statistics in a set until the `event-channel` is +signalled indicating that a monitoring agent wishes to sample them. +When such a signal is received, all statistics in the set should be +updated in a consistent manner and a signal sent back to the monitoring +agent via the `event-channel` to say that the update is complete. The +monitoring agent can then sample the whole set. + +If the statistics provider did not write an `event-channel` key then the +statstics are considered *free running* i.e. there is no expectation of +consistency between the values and a monitoring agent is at liberty to +sample an arbitrary subset at any time. + +When a provider wishes to withdraw a set of statistics, e.g. when it is +shutting down, it notifies a monitoring agent by removing the `ready` key +from xenstore. Thus a monitoring agent must maintain a watch on that key +and respond in a timely manner by closing any event channel and releasing +all grant references (which it may have mapped). +Once the agent has done this, it must remove its `monitor-id` key so that +the provider can close any event channel, revoke all grants and remove the +remaining keys corresponding to the set. When the provider is no longer +advertising any sets it can then remove its top-level key. -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |