[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Inter-domain Communication using Virtual Sockets (high-level design)

On 13/06/13 17:27, Tim Deegan wrote:
> Hi,
> At 19:07 +0100 on 11 Jun (1370977636), David Vrabel wrote:
>> This is a high-level design document for an inter-domain communication
>> system under the virtual sockets API (AF_VSOCK) recently added to Linux.
> This document covers a lot of ground (transport, namespace &c), and I'm
> not sure where the AF_VSOCK interface comes in that.  E.g., are
> communications with the 'connection manager' done by the application
> (like DNS lookups) or by the kernel (like routing)?

The doc doesn't really explain this.

The connection manager is a user space process that opens a AF_VSOCK
listening socket on port 1.  The vsock transport of the frontend
effectively connects to this port (but since its in kernel code it
doesn't use the socket API).

>> Design Map
>> ----------
>> The linux kernel requires a Xen-specific virtual socket transport and
>> front and back drivers.
>> The connection manager is a new user space daemon running in the
>> backend domain.
> One in every domain that runs backends, or one for the whole system?

One per backend, but I would anticipate there being only one backend for
most hosts.

>> Linux's virtual sockets are used as the interface to applications.
>> Virtual sockets were introduced in Linux 3.9 and provides a hypervisor
>> independent[^1] interface to user space applications for inter-domain
>> communication.
>> [^1]: The API and address format is hypervisor independent but the
>> address values are not.
>> An internal API is provided to implement a low-level virtual socket
>> transport.  This will be implemented within a pair of front and back
>> drivers.  The use of the standard front/back driver method allows the
>> toolstack to handle the suspend, resume and migration in a similar way
>> to the existing drivers.
> What does that look like at the socket interface?  Would an AF_VSOCK
> socket transparently stay open across migrate but connect to a different
> backend?  Or would it be torn down and the application need to DTRT
> about re-connecting?

All connections are disconnected on migration.  The applications will
need to be able to handle this.

The initial use case for this (in XenServer) is for service domains
which would not be migrated anyway.

>> The front/back pair provides a point-to-point link between the two
>> domains.  This is used to communicate between applications on those
>> hosts and between the frontend domain and the _connection manager_
>> running on the backend.
>> The connection manager allows domUs to request direct connections to
>> peer domains.  Without the connection manager, peers have no mechanism
>> to exchange the information ncessary for setting up the direct
>> connections.
> Sure they do -- they can use any existing shared namespace.  Xenstore
> is the obvious candidate, but there's always DNS, or twitter. :P

I meant we need to /define/ a mechanism.  Using twitter might be fun but
it does need to something within the host ;).

>> The toolstack sets the policy in the connection manager
>> to allow connection requests.  The default policy is to deny
>> connection requests.
> Hmmm.  Since the underlying transports use their own ACLs (e.g. grant
> tables), the connection manager can't actually stop two domains from
> communicating.  You'd need to use XSM for that.

I think there are two security concerns here.

1. Preventing two co-operating domains from setting up a communication


2. Preventing a domain from connecting to vsock services listening in
another domain.

As you say, the connection manager does not address the first and XSM
would be needed.  This isn't something introduced by this design though.

For the second, I think the connection manager does work here and I
think it is useful to have this level of security without having a
requirement to use XSM.

>> High Level Design
>> =================
>> Virtual Sockets
>> ---------------
>> The AF_VSOCK socket address family in the Linux kernel has a two part
>> address format: a uint32_t _context ID_ (_CID_) identifying the domain
>> and a uint32_t port for the specific service in that domain.
>> The CID shall be the domain ID and some CIDs have a specific meaning.
>> CID                     Purpose
>> -------------------     -------
>> 0x7FF0 (DOMID_SELF)     The local domain.
>> 0x7FF1                  The backend domain (where the connection manager
>> is).
> OK, so there's only one connection manager.  And the connection manager
> has an address at the socket interface -- does that mean application
> code should connect to it and send it requests?  But the information in
> those requests is only useful to the code below the socket interface.

I think I addressed this above.

>> Connection Manager
>> ------------------
>> The connection manager has two main purposes.
>> 1. Checking that two domains are permitted to connect.
> As I said, I don't think that can work.
>> 2. Providing a mechanism for two domains to exchange the grant
>>    references and event channels needed for them to setup a shared
>>    ring transport.
> If they already want to talk to each other, they can communicate all
> that in a single grant ref (which is the same size as an AF_VSOCK port).

The shared rings are per-peer not per-listener.  If a peer becomes
compromised and starts trying a DoS attack (for example), the ring can
be shutdown without impacting other guests.

> So I guess the purpose is multiplexing connection requests: some sort of
> listener in the 'backend' must already be talking to the manager (and
> because you need the manager to broker new connections, so must the
> frontend).
> Wait, is this connection manager just xenstore in a funny hat?  Or could
> it be implemented by adding a few new node/permission types to xenstore?

Er yes, I think this is just xenstore in a funny hat.  Reusing xenstore
would seem preferable to implementing a new daemon.

>> Domains commnicate with the connection manager over the front-back
>> transport link.  The connection manager must be in the same domain as
>> the virtual socket backend driver.
>> The connection manager opens a virtual socket and listens on a well
>> defined port (port 1).
>> The following messages are defined.
>> Message          Purpose
>> -------          -------
>> CONNECT_req      Request connection to another peer.
>> CONNECT_rsp      Response to a connection request.
>> CONNECT_ind      Indicate that a peer is trying to connect.
>> CONNECT_ack      Acknowledge a connection request.
> Again, are these messages carried in a socket connection, or done under
> the hood on a non-socket channel?  Or some mix of the two?  I think I
> must be missing some key part of the picture. :)
>> V4V
>> ---
>> ### Advantages
>> * Does not use grant table resource.  If shared rings are used then a
>>   busy guest with hundreds of peers will require more grant table
>>   entries than the current default.
>> ### Disadvantages
>> * Any changes or extentions to the protocol or ring format would
>>   require a hypervisor change.  This is more difficult than making
>>   changes to guests.
> In practice, it's often easier to upgrade the hypervisor than the guest
> kernels, but I agree that it's bad to have mechanism in the hypervisor.

If this mechanism needs to be extended, the backend domain can be
restarted with a new kernel with minimal impact to already running guests.

>> * The connection-less, "shared-bus" model of v4v is unsuitable for
>>   untrusted peers.  This requires layering a connection model on top
>>   and much of the simplicity of the v4v ABI is lost.
> I think that if v4v can't manage a listen/connect model, then that's a
> bug in v4v rather than a design-level drawback.  My understanding was
> that the shared-receiver ring was intended to serve this purpose, and
> that v4vtables would be used to silence over-loud peers (much like the
> ACL you propose for the connection manager).  Ross?

The v4vtable rules can only be modified by a privileged domain.  Other
guest would need some way to request new rules or the ability to set
some per-receive ring rules.

>> * The mechanism for handling full destination rings will not scale up
>>   on busy domains.  The event channel only indicates that some ring
>>   may have space -- it does not identify which ring has space.
> That's a fair point, which you raised on the v4v thread, and one that I
> expect Ross to address.
> I'd be very interested to hear the v4v authors' opinions on this VSOCK
> draft, btw -- in particular if it (or something similar) can provide all
> v4v's features without new hypervisor code, I'd very much prefer it.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.