Xen project Mailing List

Re: [Xen-devel] [DRAFT 1] XenSock protocol design document

On Mon, 11 Jul 2016, Joao Martins wrote: > On 07/08/2016 12:23 PM, Stefano Stabellini wrote: > > Hi all, > > > Hey! > > [...] > > > > > ## Design > > > > ### Xenstore > > > > The frontend and the backend connect to each other exchanging information > > via > > xenstore. The toolstack creates front and back nodes with state > > XenbusStateInitialising. There can only be one XenSock frontend per domain. > > > > #### Frontend XenBus Nodes > > > > port > > Values: <uint32_t> > > > > The identifier of the Xen event channel used to signal activity > > in the ring buffer. > > > > ring-ref > > Values: <uint32_t> > > > > The Xen grant reference granting permission for the backend to map > > the sole page in a single page sized ring buffer. > > Would it make sense to export minimum, default and maximum size of the socket > over > xenstore entries? It normally follows a convention depending on the type of > socket > (and OS) you have, or then through settables on socket options. It makes sense, Juergen suggested something similar. I am thinking of passing the maximum order of the data ring. > > ### Commands Ring > > > > The shared ring is used by the frontend to forward socket API calls to the > > backend. I'll refer to this ring as **commands ring** to distinguish it from > > other rings which will be created later in the lifecycle of the protocol > > (data > > rings). The ring format is defined using the familiar `DEFINE_RING_TYPES` > > macro > > (`xen/include/public/io/ring.h`). Frontend requests are allocated on the > > ring > > using the `RING_GET_REQUEST` macro. > > > > The format is defined as follows: > > > > #define XENSOCK_DATARING_ORDER 6 > > #define XENSOCK_DATARING_PAGES (1 << XENSOCK_DATARING_ORDER) > > #define XENSOCK_DATARING_SIZE (XENSOCK_DATARING_PAGES << PAGE_SHIFT) > > > > #define XENSOCK_CONNECT 0 > > #define XENSOCK_RELEASE 3 > > #define XENSOCK_BIND 4 > > #define XENSOCK_LISTEN 5 > > #define XENSOCK_ACCEPT 6 > > #define XENSOCK_POLL 7 > > > > struct xen_xensock_request { > > uint32_t id; /* private to guest, echoed in response */ > > uint32_t cmd; /* command to execute */ > > uint64_t sockid; /* id of the socket */ > > union { > > struct xen_xensock_connect { > > uint8_t addr[28]; > > uint32_t len; > > uint32_t flags; > > grant_ref_t ref[XENSOCK_DATARING_PAGES]; > > uint32_t evtchn; > > } connect; > > struct xen_xensock_bind { > > uint8_t addr[28]; /* ipv6 ready */ > > uint32_t len; > > } bind; > > struct xen_xensock_accept { > > uint64_t sockid; > > grant_ref_t ref[XENSOCK_DATARING_PAGES]; > > uint32_t evtchn; > > } accept; > > } u; > > }; > > > > The first three fields are common for every command. Their binary layout > > is: > > > > 0 4 8 12 16 > > +-------+-------+-------+-------+ > > | id | cmd | sockid | > > +-------+-------+-------+-------+ > > > > - **id** is generated by the frontend and identifies one specific request > > - **cmd** is the command requested by the frontend: > > - `XENSOCK_CONNECT`: 0 > > - `XENSOCK_RELEASE`: 3 > > - `XENSOCK_BIND`: 4 > > - `XENSOCK_LISTEN`: 5 > > - `XENSOCK_ACCEPT`: 6 > > - `XENSOCK_POLL`: 7 > > - **sockid** is generated by the frontend and identifies the socket to > > connect, > > bind, etc. A new sockid is required on `XENSOCK_CONNECT` and > > `XENSOCK_BIND` > > commands. A new sockid is also required on `XENSOCK_ACCEPT`, for the new > > socket. > > > Interesting - Have you consider setsockopt and getsockopt to be part of this? > There > are some common options (as in POSIX defined) and then some more exotic > flavors Linux > or FreeBSD specific. Say SO_REUSEPORT used on nginx that is good for load > balancing > across a set of workers or Linux SO_BUSY_POLL for low latency sockets. Though > not > sure how sensible it is to start exposing all of these socket options but to > limit to > a specific subset? Or maybe doesn't make sense for your case - see further > suggestion > regarding data ring part. I have considered it, but I thought that they might be better suited for a v2 version of the spec. This protocol needs to be extensible and adding two new operations such as setsockopt and getsockopt should be the simplest thing to do. Old backends should return ENOTSUPP. I'll mention this explicitly in the next draft. > > All three fields are echoed back by the backend. > > > > As for the other Xen ring based protocols, after writing a request to the > > ring, > > the frontend calls `RING_PUSH_REQUESTS_AND_CHECK_NOTIFY` and issues an event > > channel notification when a notification is required. > > > > Backend responses are allocated on the ring using the `RING_GET_RESPONSE` > > macro. > > The format is the following: > > > > struct xen_xensock_response { > > uint32_t id; > > uint32_t cmd; > > uint64_t sockid; > > int32_t ret; > > }; > > > > 0 4 8 12 16 20 > > +-------+-------+-------+-------+-------+ > > | id | cmd | sockid | ret | > > +-------+-------+-------+-------+-------+ > > > > - **id**: echoed back from request > > - **cmd**: echoed back from request > > - **sockid**: echoed back from request > > - **ret**: return value, identifies success or failure > > > Are these fields taken from a specific OS (I assumed Linux)? Probably ids, > cmd and > ret size could be less big overall or may be not - in which case could be > useful > specifying in the spec if it's following a specific OS. I'll do. > [...] > > > The design is flexible and can support different ring sizes (at compile > > time). > > The following description is based on order 6 rings, chosen because they > > provide > > excellent performance. > > > > - **in** is an array of 65536 bytes, used as circular buffer > > It contains data read from the socket. The producer is the backend, the > > consumer is the frontend. > > - **out** is an array of 131072 bytes, used as circular buffer > > It contains data to be written to the socket. The producer is the > > frontend, > > the consumer is the backend. > Could this size be a tunable intercepting RCVBUF and SNDBUF sockopt > adjustments > (these two are POSIX defined) ofc under the assumption that in this proposal > you want > to replicate local and remote socket? IOW to dynamically allocate how much > the socket > will use for sending/receiving which would turn into the amount of grants in > use? > Even doing with xenstore entries in the backend is better - even though user > may want > to adjust send/receive buffer for whatever aplication needs. Ideally this > would be > dynamic per socket, instead of compile-time defined - and would allow more > sockets on > the same VM without overshooting the grant table limits. I am working on changing the spec to make the size of the data ring configurable per socket. Each socket will be able to have a ring of a different size (I am adding a per-socket ring_order parameter). Hooking it all up with RCVBUF and SNDBUF should be possible, but I'll leave it for the future. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.