Xen project Mailing List

Re: [Xen-devel] Inter-domain Communication using Virtual Sockets (high-level design)

To: David Vrabel <david.vrabel@xxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 11 Jun 2013 19:54:19 +0100

Cc: Vincent Hanquez <Vincent.Hanquez@xxxxxxxxxxxxx>, Ross Philipson <Ross.Philipson@xxxxxxxxxx>, "Xen-devel@xxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 11 Jun 2013 18:55:07 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11/06/13 19:07, David Vrabel wrote: > All, > > This is a high-level design document for an inter-domain communication > system under the virtual sockets API (AF_VSOCK) recently added to Linux. > > Two low-level transports are discussed: a shared ring based one > requiring no additional hypervisor support and v4v. > > The PDF (including the diagrams) is available here: > > http://xenbits.xen.org/people/dvrabel/inter-domain-comms-C.pdf > > % Inter-domain Communication using Virtual Sockets > % David Vrabel <<david.vrabel@xxxxxxxxxx> Mismatched angles. > % Draft C > > Introduction > ============ > > Revision History > ---------------- > > -------------------------------------------------------------------- > Version Date Changes > ------- ----------- ---------------------------------------------- > Draft C 11 Jun 2013 Minor clarifications. > > Draft B 10 Jun 2013 Added a section on the low-level shared ring > transport. > > Added a section on using v4v as the low-level > transport. > > Draft A 28 May 2013 Initial draft. > -------------------------------------------------------------------- > > Purpose > ------- > > In the Windsor architecture for XenServer, dom0 is disaggregated into > several _service domains_. Examples of service domains include > network and storage driver domains, and qemu (stub) domains. > > To allow the toolstack to manage service domains there needs to be a > communication mechanism between the toolstack running in one domain and > all the service domains. > > The principle focus of this new transport is control-plane traffic > (low latency and low data rates) but consideration is given to future > uses requiring higher data rates. > > Linux 3.9 support virtual sockets which is a new type of socket (the > new AF_VSOCK address family) for inter-domain communication. This was > originally implemented for VMWare's VMCI transport but has hooks for > other transports. This will be used to provide the interface to > applications. > > > System Overview > --------------- > > ![\label{fig_overview}System Overview](overview.pdf) > > > Design Map > ---------- > > The linux kernel requires a Xen-specific virtual socket transport and > front and back drivers. > > The connection manager is a new user space daemon running in the > backend domain. > > Toolstacks will require changes to allow them to set the policy used > by the connection manager. The design of these changes is out of > scope of this document. > > Definitions and Acronyms > ------------------------ > > _AF\_VSOCK_ > ~ The address family for virtual sockets. > > _CID (Context ID)_ > > ~ The domain ID portion of the AF_VSOCK address format. > > _Port_ > > ~ The part of the AF_VSOCK address format identifying a specific > service. Similar to the port number used in TCP connection. > > _Virtual Socket_ > > ~ A socket using the AF_VSOCK protocol. > > References > ---------- > > [Windsor Architecture slides from XenSummit > 2012](http://www.slideshare.net/xen_com_mgr/windsor-domain-0-disaggregation-for-xenserver-and-xcp) > > > Design Considerations > ===================== > > Assumptions > ----------- > > * There exists a low-level peer-to-peer, datagram based transport > mechanism using shared rings (as in libvchan). > > Constraints > ----------- > > * The AF_VSOCK address format is limited to a 32-bit CID and a 32-bit > port number. This is sufficient as Xen only has 16-bit domain IDs. > > Risks and Volatile Areas > ------------------------ > > * The transport may be used between untrusted peers. A domain may be > subject to malicious activity or denial of service attacks. > > Architecture > ============ > > Overview > -------- > > ![\label{fig_architecture}Architecture Overview](architecture.pdf) > > Linux's virtual sockets are used as the interface to applications. > Virtual sockets were introduced in Linux 3.9 and provides a hypervisor > independent[^1] interface to user space applications for inter-domain > communication. > > [^1]: The API and address format is hypervisor independent but the > address values are not. > > An internal API is provided to implement a low-level virtual socket > transport. This will be implemented within a pair of front and back > drivers. The use of the standard front/back driver method allows the > toolstack to handle the suspend, resume and migration in a similar way > to the existing drivers. > > The front/back pair provides a point-to-point link between the two > domains. This is used to communicate between applications on those > hosts and between the frontend domain and the _connection manager_ > running on the backend. > > The connection manager allows domUs to request direct connections to > peer domains. Without the connection manager, peers have no mechanism > to exchange the information ncessary for setting up the direct > connections. The toolstack sets the policy in the connection manager > to allow connection requests. The default policy is to deny > connection requests. > > > High Level Design > ================= > > Virtual Sockets > --------------- > > The AF_VSOCK socket address family in the Linux kernel has a two part > address format: a uint32_t _context ID_ (_CID_) identifying the domain > and a uint32_t port for the specific service in that domain. > > The CID shall be the domain ID and some CIDs have a specific meaning. > > CID Purpose > ------------------- ------- > 0x7FF0 (DOMID_SELF) The local domain. > 0x7FF1 The backend domain (where the connection manager > is). 0x7FF1 is DOMID_IO which has a separate definition as far as Xen is concerned. Is it not possible for this information to be in xenstore? > > Some port numbers are reserved. > > Port Purpose > ---- ------- > 0 Reserved > 1 Connection Manager > 2-1023 Reserved for well-known services (such as a service discovery > service). If you are making use of DOMID_SELF, probably also make use of DOMID_FIRST_RESERVED, which has the same numeric value. > > Front / Back Drivers > -------------------- > > Using a front or back driver to provide the virtual socket transport > allows the toolstack to only make the inter-domain communication > facility available to selected domains. > > The "standard" xenbus connection state machine shall be used. See > figures \ref{fig_front-sm} and \ref{fig_back-sm} on pages > \pageref{fig_front-sm} and \pageref{fig_back-sm}. > > ![\label{fig_front-sm}Frontend Connection State Machine](front-sm.pdf) > > ![\label{fig_back-sm}Backend Connection State Machine](back-sm.pdf) > > > Connection Manager > ------------------ > > The connection manager has two main purposes. > > 1. Checking that two domains are permitted to connect. > > 2. Providing a mechanism for two domains to exchange the grant > references and event channels needed for them to setup a shared > ring transport. > > Domains commnicate with the connection manager over the front-back > transport link. The connection manager must be in the same domain as > the virtual socket backend driver. > > The connection manager opens a virtual socket and listens on a well > defined port (port 1). > > The following messages are defined. > > Message Purpose > ------- ------- > CONNECT_req Request connection to another peer. > CONNECT_rsp Response to a connection request. > CONNECT_ind Indicate that a peer is trying to connect. > CONNECT_ack Acknowledge a connection request. > > ![\label{fig_conn-msc}Connect Message Sequence Chart](conn.pdf) > > Before forwarding a connection request to a peer, the connection > manager checks that the connection is permitted. The toolstack sets > these permissions. > > Disconnecting transport links to an uncooperative (or dead) domain is > required. Therefore there are no messages for disconnecting transport > links (as these may be ignore or delayed). Instead a transport link is > disconnected by tearing down the local end. The peer will notice the > remote end going away and then teardown its end. > > Low-level transport > =================== > > [ This exact details are yet to be determined but this section should > provide a reasonably summary of the mechanisms used. ] > > Frontend and backend domains > ---------------------------- > > As is typical for frontend and backend drivers, the frontend will > grant copy-only access to two rings -- one for from-front messages and > one for to-front messages. Each ring shall have an event channel for > notifying when requests and responses are placed on the ring. The term "grant copy-only" is very confusing to read in context. However I cant offhand think of a better way of describing it. ~Andrew > > Peer domains > ------------ > > The initiator grants copy-only access to a from-initiator (transmit) > ring and provides an event channel for notifications for this ring. > This information is included in the CONNECT_req and CONNECT_ind > messages. > > The responder grants copy-only access to a from-responder (transmit) > ring and provides an event channel for notifications for this ring. > The information is included in the CONNECT_ack and CONNECT_rsp > messages. > > After the initial connection, the two domains operate as identical > peers. Disconnection is signalled by a domain ungranting its transmit > ring, notifying the peer via the associated event channel. The event > channel is then unbound. > > Appendix > ======== > > V4V > --- > > An alternative low-level transport (V4V) has been proposed. The > hypervisor copies messages from the source domain into a destination > ring provided by the destination domain. > > Because peers are untrusted, in order to prevent them from being able > to denial-of-service the processing of messages from other peers, each > receiver must have a per-peer receive ring. A listening service does > not know in advance which peers may connect so it cannot create these > rings in advance. > > The connection manager service running in a trusted domain (as in the > shared ring transport described above) may be used. The CONNECT_ind > message is used to trigger the creation of receive ring for that > specific sender. > > A peer must be able to find the connection manager service both at > start of day and if the connection manager service is restarted in a > new domain. This can be done in two possible ways: > > 1. Watch a Xenstore key which contains the connection manager service > domain ID. > > 2. Use a frontend/backend driver pair. > > ### Advantages > > * Does not use grant table resource. If shared rings are used then a > busy guest with hundreds of peers will require more grant table > entries than the current default. > > ### Disadvantages > > * Any changes or extentions to the protocol or ring format would > require a hypervisor change. This is more difficult than making > changes to guests. > > * The connection-less, "shared-bus" model of v4v is unsuitable for > untrusted peers. This requires layering a connection model on top > and much of the simplicity of the v4v ABI is lost. > > * The mechanism for handling full destination rings will not scale up > on busy domains. The event channel only indicates that some ring > may have space -- it does not identify which ring has space. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.