[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [DRAFT 1] XenSock protocol design document



On Mon, 11 Jul 2016, Paul Durrant wrote:
> > -----Original Message-----
> [snip]
> > 
> > # XenSocks Protocol v1
> > 
> > ## Rationale
> > 
> > XenSocks is a paravirtualized protocol for the POSIX socket API.
> > 
> > The purpose of XenSocks is to allow the implementation of a specific set
> > of POSIX calls to be done in a domain other than your own. It allows
> > connect, accept, bind, release, listen, poll, recvmsg and sendmsg to be
> > implemented in another domain.
> 
> Does the other domain have privilege over the domain issuing the POSIX calls?

I don't have a strong opinion on this. In my scenario the backend is in
fact always dom0, but so far nothing in the protocol would prevent
XenSock from being used with driver domains AFAICT. Maybe writing down
that the backend needs to be privileged would allow us to take some
shortcuts in the future, but as there are none at the moment, I don't
think we should make this a requirement. What do you think?


> [snip]
> > #### State Machine
> > 
> >     **Front**                             **Back**
> >     XenbusStateInitialising               XenbusStateInitialising
> >     - Query virtual device                - Query backend device
> >       properties.                           identification data.
> >     - Setup OS device instance.                          |
> >     - Allocate and initialize the                        |
> >       request ring.                                      V
> >     - Publish transport parameters                XenbusStateInitWait
> >       that will be in effect during
> >       this connection.
> >                  |
> >                  |
> >                  V
> >        XenbusStateInitialised
> > 
> >                                           - Query frontend transport 
> > parameters.
> >                                           - Connect to the request ring and
> >                                             event channel.
> >                                                          |
> >                                                          |
> >                                                          V
> >                                                  XenbusStateConnected
> > 
> >      - Query backend device properties.
> >      - Finalize OS virtual device
> >        instance.
> >                  |
> >                  |
> >                  V
> >         XenbusStateConnected
> > 
> > Once frontend and backend are connected, they have a shared page, which
> > will is used to exchange messages over a ring, and an event channel,
> > which is used to send notifications.
> > 
> 
> What about XenbusStateClosing and XenbusStateClosed? We're missing half the 
> state model here. Specifically how do individual connections get terminated 
> if either end moves to closing? Does either end have to wait for the other?

I admit I "took inspiration" from xen/include/public/io/blkif.h, which
is also missing the closing steps. I'll try to add them. (If you know of
any existing descriptions of a XenBus closing protocol please let me
know.)


> > 
> > ### Commands Ring
> > 
> > The shared ring is used by the frontend to forward socket API calls to the
> > backend. I'll refer to this ring as **commands ring** to distinguish it from
> > other rings which will be created later in the lifecycle of the protocol 
> > (data
> > rings). The ring format is defined using the familiar `DEFINE_RING_TYPES`
> > macro
> > (`xen/include/public/io/ring.h`). Frontend requests are allocated on the 
> > ring
> > using the `RING_GET_REQUEST` macro.
> > 
> > The format is defined as follows:
> > 
> >     #define XENSOCK_DATARING_ORDER 6
> >     #define XENSOCK_DATARING_PAGES (1 << XENSOCK_DATARING_ORDER)
> >     #define XENSOCK_DATARING_SIZE (XENSOCK_DATARING_PAGES <<
> > PAGE_SHIFT)
> > 
> 
> Why a fixed size? Also, I assume DATARING should be CMDRING or somesuch here. 
> Plus a fixed size of *six* pages seems like a lot.

This is going to be changed and significantly improved following
Juergen's suggestion.

 
> > Return value:
> > 
> >   - 0 on success
> >   - less than 0 on failure, see the error codes of the socket system call
> > 
> 
> The socket system call on which OS?

I'll add more info on this. I'll try to stick to POSIX as much as I can,
defining explicitly anything which is not specified by it (such as error
numbers).


> > #### Bind
> > 
> > The **bind** operation assigns the address passed as parameter to the
> > socket.
> > It corresponds to the bind system call.
> 
> Is a domain allowed to bind to a privileged port in the backend domain?

I would let the backend decide: the backend can return -EACCES if it
doesn't want to allow access to a given port.


> > **sockid** is freely chosen by the
> > frontend and references this specific socket from this point forward.
> > **Bind**,
> > **listen** and **accept** are the three operations required to have fully
> > working passive sockets and should be issued in this order.
> > 
> > Fields:
> > 
> > - **cmd** value: 4
> > - additional fields:
> >   - **addr**: address to bind to, in struct sockaddr format
> >   - **len**: address length
> > 
> > Binary layout:
> > 
> >         16      20      24      28      32      36      40      44     48
> >         +-------+-------+-------+-------+-------+-------+-------+-------+
> >         |                            addr                       |  len  |
> >         +-------+-------+-------+-------+-------+-------+-------+-------+
> > 
> > Return value:
> > 
> >   - 0 on success
> >   - less than 0 on failure, see the error codes of the bind system call
> > 
> > 
> > #### Listen
> > 
> > The **listen** operation marks the socket as a passive socket. It
> > corresponds to
> > the listen system call.
> 
> ...which also takes a 'backlog' parameter, which doesn't seem to be specified 
> here.

Fixed, thanks!


> >             XENSOCK_RING_IDX in_cons, in_prod;
> >             XENSOCK_RING_IDX out_cons, out_prod;
> >             int32_t in_error, out_error;
> >     };
> > 
> > The design is flexible and can support different ring sizes (at compile 
> > time).
> > The following description is based on order 6 rings, chosen because they
> > provide
> > excellent performance.
> > 
> 
> What about datagram sockets? Raw sockets? Setting socket options? Etc.

All currently unimplemented. Probably they are not going to be part of
the initial version of the protocol, but it would be nice if the
protocol was flexible enough to allow somebody in the future to jump in
and add them without too much trouble.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.