Xen project Mailing List

Re: [Xen-devel] [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing

To: Paul Durrant <paul.durrant@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Ian Campbell <ian.campbell@xxxxxxxxxx>

Date: Fri, 8 Jan 2016 15:53:46 +0000

Cc: Keir Fraser <keir@xxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Tim Deegan <tim@xxxxxxx>

Delivery-date: Fri, 08 Jan 2016 15:54:14 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote: >Â > diff --git a/xen/include/public/io/netif.h > b/xen/include/public/io/netif.h > index 1790ea0..06e0b61 100644 > --- a/xen/include/public/io/netif.h > +++ b/xen/include/public/io/netif.h > @@ -151,6 +151,270 @@ > Â */ > Â > Â/* > + * Control ring > + * ============ > + * > + * Some features, such as toeplitz hashing (detailed below), require a > + * significant amount of out-of-band data to be passed from frontend to > + * backend. Use of xenstore is not suitable for large quantities of data > + * because of quota limitations and so a dedicated 'control ring' is used. > + * The ability of the backend to use a control ring is advertised by > + * setting: > + * > + * /local/domain/X/backend///feature-control-ring = "1" > + * > + * The frontend provides a control ring to the backend by setting: > + * > + * /local/domain//device/vif//ctrl-ring-ref = > + * /local/domain//device/vif//event-channel-ctrl = > + * > + * where is the grant reference of the shared page used to > + * implement the control ring and is an event channel to be used > + * as a mailbox interrupt, before the frontend moves into the connected > + * state. I read this as saying that the mailbox interrupt can be used until the frontend moves into the connected state, which I realise isn't what you meant but having so much stuff between "by setting" and "before" makes it easy to read wrongly. How about "... as a mailbox interrupt. These keys must be written before moving into the connected state." ? > + * The control ring uses a fixed request/response message size and is > + * balanced (i.e. one request to one response), so operationally it is much > + * the same as a tramsmit or receive ring. "transmit". Is it intended to support out of order responses? > + */ > + > +/* > + * Toeplitz hash types > + * =================== > + * > + * For the purposes of the definitions below, 'Packet[]' is an array of > + * octets containing an IP packet without options, 'Array[X..Y]' means a > + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is > + * used to indicate concatenation of arrays. > + */ > + > +/* > + * A hash calculated over an IP version 4 header as follows: > + * > + * Buffer[0..8] = Packet[12..15] + Packet[16..19] Is there a reason to write it like this rather than: + * Buffer[0..8] = Packet[12..19] ? Maybe something which is obvious if you know about Toeplitz in more than just a passing fashion? > +/* > + * Control requests (netif_ctrl_request_t) > + * ======================================= > + * > + * All requests have the following format: > + * > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * |ÂÂÂÂidÂÂÂÂÂ|ÂÂÂtypeÂÂÂÂ|ÂÂÂÂÂÂÂÂÂdata[0]ÂÂÂÂÂÂÂ| > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * |ÂÂÂÂÂÂÂÂÂdata[1]ÂÂÂÂÂÂÂ| > + * +-----+-----+-----+-----+ These are packed in the ring, i.e. there are no implicit padding bytes, right? I know the Toeplitz stuff doesn't need it, but we should consider designing in a scheme to handle control commands which need >8 octets of data, or else we risk ending up with something as confusing as the data path... > + * > + * id: the request identifier, echoed in response. > + * type: the type of request (see below) > + * data[]: any data associated with the request (determined by type) > + */ > + > +struct netif_ctrl_request { > +ÂÂÂÂuint16_t id; > +ÂÂÂÂuint16_t type; > + > +#define NETIF_CTRL_TYPE_INVALIDÂÂÂÂÂÂÂÂÂÂÂÂÂÂ0 > +#define NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGSÂÂÂ1 > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGSÂÂÂ2 > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_KEYÂÂÂÂÂ3 > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING 4 > + > +ÂÂÂÂuint32_t data[2]; > +}; > +typedef struct netif_ctrl_request netif_ctrl_request_t; > + > +/* > + * type = NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS: > + * > + * This is sent by the frontend to query the types of toeplitz > + * hash supported by the backend. No data is required and to the "and to the" has an extra "to" in it I think? > + * data[] field is set to 0. "fields are" ? > + * > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS: > + * > + * This is sent by the frontend to set the types of toeplitz hash that > + * the backend should calculate. Note that the 'maximal' type of hash > + * should always be chosen. For example, if the frontend sets both IPV4 > + * and IPV4_TCP hash types then the latter hash type should be calculated > + * for any TCP packet and the former only calculated for non-TCP packets. > + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values > + * defined above. The data[1] field is set to 0. > + * > + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the backend > + *ÂÂÂÂÂÂÂis free to choose how it steers packets to queues (which is the > + *ÂÂÂÂÂÂÂdefault state). > + * > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY: Can this be sent before FLAGS? Is it permissible to send this for a hash scheme not included in flags? > + * > + * This is sent by the frontend to set the key of toeplitz hash that > + * the backend should calculate. The toeplitz algorithm is illustrated > + * by the following pseudo-code: > + * > + * (Buffer[] and Key[] are treated as shift-registers where the MSB of > + * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N- > 1] > + * is the 'right-most'). > + * > + * Value = 0 > + * For number of bits in Buffer[] > + *ÂÂÂÂIf (left-most bit of Buffer[] is 1) > + *ÂÂÂÂÂÂÂÂValue ^= left-most 32 bits of Key[] > + *ÂÂÂÂKey[] << 1 > + *ÂÂÂÂBuffer[] << 1 > + * > + * The data[0] field is set to the size of key in octets. The data[1] > + * field is set to a grant reference of a page containing the key. I suppose it is implicit that the key starts at offset 0 in the page? > The > + * reference must remain valid until the corresponding > + * netif_ctrl_response_t has been processed. May the ref be r/o? If so, should that be encouraged? > + * > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING: > + * > + * This is sent by the frontend to set the mapping of toeplitz hash to > + * queue number to be applied by the backend. > + * > + * The data[0] field is set to the order of the mapping. The data[1] field > + * is set to a grant reference of a page containing the mapping. The > + * reference must remain valid until the corresponding > + * netif_ctrl_response_t has been processed. > + * > + * The format of the mapping is: > + * > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[0]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[1]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[2]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[3]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > + *ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ. > + *ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ. > + * |ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue[N-1]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ| > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * > + * where each queue value is less than "multi-queue-num-queues" (see above) > + * and N is 1 << data[0]. > + * > + * NOTE: Before a specific mapping is set using this request, the backend > + *ÂÂÂÂÂÂÂshould map all toeplitz hash values to queue 0 (which is the only > + *ÂÂÂÂÂÂÂqueue guaranteed to exist in all cases). i.e. between receiving a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS with a non-zero set of flags andÂNETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING it should do this? If it has not seen a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS, or it has seen one with flags = 0 then as before the b/e is free to steer as it likes? Is it valid to send this mapping command first before the flags? > + */ > + > +/* > + * Control responses (netif_ctrl_response_t) > + * ========================================= > + * > + * All responses have the following format: > + * > + *ÂÂÂÂ0ÂÂÂÂÂ1ÂÂÂÂÂ2ÂÂÂÂÂ3ÂÂÂÂÂ4ÂÂÂÂÂ5ÂÂÂÂÂ6ÂÂÂÂÂ7ÂÂoctet > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * |ÂÂÂÂidÂÂÂÂÂ|ÂÂÂpadÂÂÂÂÂ|ÂÂÂÂÂÂÂÂÂstatusÂÂÂÂÂÂÂÂ| > + * +-----+-----+-----+-----+-----+-----+-----+-----+ > + * |ÂÂÂÂÂÂÂÂÂdataÂÂÂÂÂÂÂÂÂÂ| > + * +-----+-----+-----+-----+ > + * > + * id: the corresponding request identifier > + * pad: set to 0 > + * status: the status of request processing > + * data: any data associated with the response (determined by type and > + *ÂÂÂÂÂÂÂstatus) type needs to be remembered via the id? Why not echo it in the pad field for convenience? > + */ > + > +struct netif_ctrl_response { > +ÂÂÂÂuint16_t id; > +ÂÂÂÂuint16_t pad; > +ÂÂÂÂuint32_t status; > + > +#define NETIF_CTRL_STATUS_SUCCESSÂÂÂÂÂÂÂÂÂÂÂ0 > +#define NETIF_CTRL_STATUS_NOT_SUPPORTEDÂÂÂÂÂ1 > +#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2 > +#define NETIF_CTRL_STATUS_BUFFER_OVERFLOWÂÂÂ3 Is the intention that these will be the union of all possible failure modes, rather than making provision for type-specific failures? These are all pretty generic though, and I suppose we could always figure that out when some command has a very obscure failure case. > + > +ÂÂÂÂuint32_t data; > +}; > +typedef struct netif_ctrl_response netif_ctrl_response_t; > + > +/* > + * type = > + * > + * The default response for any unrecognised request has the status field > + * set to NETIF_CTRL_STATUS_NOT_SUPPORTED and the data field set to 0. > + * > + * type = NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS: > + * > + * Since the request carries no data there is no reason for processing to > + * fail, hence the status field is set to NETIF_CTRL_STATUS_SUCCESS and the > + * data field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values (defined > + * above) indicating which hash types are supported by the backend. > + * If no hashing is supported then the data field should be set to 0. I wonder if it would be clearer to define the generic request and response structures up front followed by the specific commands and their request+response formats at the same time, rather than splitting the specific command requests and responses into different places? Like: Command:ÂNETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS Description: Blah blah Request data format: Words wordsÂ Valid response status codes: Â - One Â - Another Responds data format: Words (i.e. more like API than protocol documentation in the form of independent looking things going back and forth) > + * > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS: > + * > + * If the data[0] field in the request is invalid (i.e. contains unsupported > + * hash types) then the status field is set to > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset shouldÂ "request" > succeed > + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS. > + * The data field should be set to 0. > + * > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY: > + * > + * If the data[0] field in the request is an invalid key length (too big) Can it not be too small as well? What is the behaviour if it is? > + * then the status field is set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the > + * data[1] field is an invalid grant reference then the status field is set > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request should > + * succeed and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS. > + * The data field should be set to 0. > + * > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING: > + * > + * If the data[0] field in the request is an invalid mapping order (too big) or too small? > + * then the status field is set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the > + * data[1] field is an invalid grant reference then the status field is set > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset should "request" Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.