[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 2/3] public/io/netif.h: document control ring and toeplitz hashing



This patch documents a new shared (variable message length) ring between
frontend and backend that can be used to pass bulk out-of-band data, such
as that required to implement toeplitz hashing in the backend that is
configurable by the frontend.

The patch then goes on to document the messages passed over the control
ring that can be used to configure toeplitz hashing.

Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
Cc: Ian Campbell <ian.campbell@xxxxxxxxxx>
Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Keir Fraser <keir@xxxxxxx>
Cc: Tim Deegan <tim@xxxxxxx>
---
 xen/include/public/io/netif.h | 320 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 320 insertions(+)

diff --git a/xen/include/public/io/netif.h b/xen/include/public/io/netif.h
index 1790ea0..612dbd0 100644
--- a/xen/include/public/io/netif.h
+++ b/xen/include/public/io/netif.h
@@ -151,6 +151,326 @@
  */
 
 /*
+ * Control ring:
+ *
+ * Some features, such as toeplitz hashing (detailed below), require a
+ * significant amount of out-of-band data to be passed from frontend to
+ * backend. Use of xenstore is not suitable for large quantities of data
+ * because of quota limitations and so a dedicated 'control ring' is used.
+ * The ability of the backend to use a control ring is advertised by
+ * setting:
+ *
+ * /local/domain/X/backend/<domid>/<vif>/feature-control-ring = "1"
+ *
+ * The frontend provides a control ring to the backend by setting:
+ *
+ * /local/domain/<domid>/device/vif/<vif>/ctrl-ring-ref = <gref>
+ * /local/domain/<domid>/device/vif/<vif>/event-channel-ctrl = <port>
+ *
+ * where <gref> is the grant reference of the shared page used to
+ * implement the control ring and <port> is an event channel to be used
+ * as a mailbox interrupt, before the frontend moves into the connected
+ * state.
+ *
+ * The layout of the shared page is as follows:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |        req_cons       |        req_prod       |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |        rsp_cons       |        rsp_prod       |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                                               |
+ * +                                               +
+ * |                      req[1024]                |
+ *                         .
+ *                         .
+ * |                                               |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                                               |
+ * +                                               +
+ * |                      rsp[1024]                |
+ *                         .
+ *                         .
+ * |                                               |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * This provides a 1024 byte request buffer, a 1024 response buffer and
+ * producer/consumer counts for both. The frontend and backend
+ * communicate using message structures prefaced with the following
+ * header:
+ *
+ * netif_ctrl_msg_hdr_t:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | type                  | size                  |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * The type is one of the NETIF_CTRL_MSG_* values defined below and the
+ * size field specifies how many octets of payload follow the header
+ * (hence size may be 0 for messages not requiring a payload).
+ *
+ * The frontend makes a request by writing a message into the req buffer
+ * (at req_cons modulo 1024, taking care to wrap correctly), incrementing
+ * req_prod by the number of octets written and then sending a mailbox
+ * event to the backend.
+ * The message length may exceed the available space in the buffer
+ * (which can be calculated as req_cons + NETIF_CTRL_RING_SIZE - req_prod)
+ * in which case, as much data should be written as is possible and
+ * req_prod should be incremented by the number of octets written. A
+ * mailbox interrupt should then be sent to the backend to start message
+ * processing and the frontend should not write any more message data
+ * into the req buffer until the backend sends a mailbox interrupt
+ * to the frontend.
+ *
+ * The backend receives a request (when triggered to do so by a mailbox
+ * event) by reading as many octets as it can (which can be calculated
+ * as req_prod - req_cons) from the req buffer (from offset req_cons
+ * modulo 1024, taking care to wrap correctly) into a private buffer and
+ * then incrementing req_cons with the number of octets read.
+ * If a complete header (8 octets) has been read then the backend can
+ * determine how many payload octets it should expect and whether they
+ * have all been read. If they have then the message can be processed.
+ * If they have not then a mailbox event should be sent to the frontend
+ * and backend processing should be suspended until the next mailbox
+ * event arrives).
+ *
+ * The backend sends responses to the frontend using the rsp buffer in
+ * much the same way that the frontend sends requests to the backend and
+ * frontend processes the responses in much the same way that the backend
+ * processes requests.
+ * The protocol allows for a maximum of one outstanding request at any
+ * point in time. Hence the frontend should not send a new request until it
+ * has received a complete response for a previous request. Similarly
+ * the backend need only provide provide buffer space for the maximum size
+ * of request that it is prepared to handle (see specification of request
+ * types below).
+ */
+
+#define NETIF_CTRL_RING_SIZE 1024
+
+struct netif_ctrl_ring {
+       RING_IDX req_cons;
+       RING_IDX req_prod;
+       RING_IDX rsp_cons;
+       RING_IDX rsp_prod;
+       uint8_t req[NETIF_CTRL_RING_SIZE];
+       uint8_t rsp[NETIF_CTRL_RING_SIZE];
+};
+
+struct xen_netif_ctrl_msg_hdr {
+       uint16_t type;
+       uint16_t len;
+};
+
+#define NETIF_CTRL_MSG_ACK                  1
+#define NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS   2
+#define NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS   3
+#define NETIF_CTRL_MSG_SET_TOEPLITZ_KEY     4
+#define NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING 5
+
+/* Control messages: */
+
+/*
+ * NETIF_CTRL_MSG_ACK:
+ *
+ * This is the only valid type of message sent by the backend to the
+ * frontend. It carries a payload of the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                     status                    |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                                               |
+ * +                                               +
+ * |                      data[]                   |
+ *                         .
+ *                         .
+ * |                                               |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * The status field is always present and the correct size of the data
+ * field is determined by the type of request message and the value of the
+ * status field.
+ * If the backend receives a request from a frontend that it does not
+ * implement then it should respond with an ack message containing no
+ * data and status set to NETIF_CTRL_STATUS_NOT_SUPPORTED.
+ */
+
+#define NETIF_CTRL_STATUS_SUCCESS           0
+#define NETIF_CTRL_STATUS_NOT_SUPPORTED     1
+#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2
+#define NETIF_CTRL_STATUS_BUFFER_OVERFLOW   3
+
+/*
+ * NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS:
+ *
+ * This is sent by the frontend to query the types of toeplitz
+ * hash supported by the backend. It carries no payload.
+ *
+ * A successful ack message has the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |           NETIF_CTRL_STATUS_SUCCESS           |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    flags                      |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * where flags is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
+ * defined below.
+ * An unsuccessful ack message carries no data, only a status value.
+ */
+
+/*
+ * For the purposes of the definitions below, 'Packet[]' is an array of
+ * octets containing an IP packet without options, 'Array[X..Y]' means a
+ * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
+ * used to indicate concatenation of arrays.
+ */
+
+/*
+ * A hash calculated over an IP version 4 header as follows:
+ *
+ * Buffer[0..8] = Packet[12..15] + Packet[16..19]
+ * Result = ToeplitzHash(Buffer, 8)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV4     0
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV4      (1 << 
_NETIF_CTRL_TOEPLITZ_FLAG_IPV4)
+
+/*
+ * A hash calculated over an IP version 4 header and TCP header as
+ * follows:
+ *
+ * Buffer[0..12] = Packet[12..15] + Packet[16..19] +
+ *                 Packet[20..21] + Packet[22..23]
+ * Result = ToeplitzHash(Buffer, 12)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP 1
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP  (1 << 
_NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP)
+
+/*
+ * A hash calculated over an IP version 6 header as follows:
+ *
+ * Buffer[0..32] = Packet[8..23] + Packet[24..39]
+ * Result = ToeplitzHash(Buffer, 32)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV6     2
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV6      (1 << 
_NETIF_CTRL_TOEPLITZ_FLAG_IPV4)
+
+/*
+ * A hash calculated over an IP version 6 header and TCP header as
+ * follows:
+ *
+ * Buffer[0..36] = Packet[8..23] + Packet[24..39] +
+ *                 Packet[40..41] + Packet[42..43]
+ * Result = ToeplitzHash(Buffer, 36)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV6_TCP 3
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV6_TCP  (1 << 
_NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP)
+
+/*
+ * NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
+ *
+ * This is sent by the frontend to set the types of toeplitz hash that
+ * the backend should calculate. Note that the 'maximal' type of hash
+ * should always be chosen. For example, if the frontend sets both IPV4
+ * and IPV4_TCP hash types then the latter hash type should be calculated
+ * for any TCP packet and the former only calculated for non-TCP packets.
+ * The message carries a payload of the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    flags                      |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * where flags is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
+ * defined above.
+ *
+ * NOTE: Setting flags to 0 disables toeplitz hashing and the backend
+ *       is free to choose how it steers packets to queues (which is the
+ *       default state).
+ *
+ * A successful or unsuccessful ack message carries no data, only a
+ * status value.
+ */
+
+/*
+ * NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
+ *
+ * This is sent by the frontend to set the key of toeplitz hash that
+ * the backend should calculate. The toeplitz algorithm is illustrated
+ * by the following pseudo-code:
+ *
+ * (Buffer[] and Key[] are treated as shift-registers where the MSB of
+ * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-1]
+ * is the 'right-most').
+ *
+ * Value = 0
+ * For number of bits in Buffer[]
+ *    If (left-most bit of Buffer[] is 1)
+ *        Value ^= left-most 32 bits of Key[]
+ *    Key[] << 1
+ *    Buffer[] << 1
+ *
+ * Key[] is always 40 octets in length and so the message carries a
+ * payload of the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                                               |
+ * +                                               +
+ * |                                               |
+ * +                                               +
+ * |                   key[40]                     |
+ * +                                               +
+ * |                                               |
+ * +                                               +
+ * |                                               |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * A successful or unsuccessful ack message carries no data, only a
+ * status value.
+ */
+
+/*
+ * NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
+ *
+ * This is sent by the frontend to set the mapping of toeplitz hash to
+ * queue number to be applied by the backend.
+ * The message carries a payload of the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[0]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[1]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[2]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[3]                   |
+ *                         .
+ *                         .
+ * |                    queue[N-1]                 |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * N can be calculated from the payload length and only power-of-2
+ * values are valid.
+ *
+ * NOTE: Before a specific mapping is set using this request, the backend
+ *       should map all toeplitz hash values to queue 0 (which is the only
+ *       queue guaranteed to exist in all cases).
+ *
+ * A successful or unsuccessful ack message carries no data, only a
+ * status value. If the value of N is not a power of 2 or any of the
+ * queue values exceeds the number of queues in operation then status
+ * should be set to NETIF_CTRL_STATUS_INVALID_PARAMETER. If N is larger
+ * than the backend's maximal size of mapping table then status should
+ * be set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW.
+ */
+
+/*
  * Guest transmit
  * ==============
  *
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.