[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] V4V



On 05/24/2012 01:23 PM, Jean Guyader wrote:
> As I'm going through the code to clean-up XenClient's inter VM
> communication
> (V4V), I thought it would be a good idea to start a thread to talk about
> the
> fundamental differences between V4V and libvchan. I believe the two system
> are
> not clones of eachother and they serve different
> purposes.
> 
> 
> Disclaimer: I'm not an expert in libvchan; most of the assertion I'm doing
> about libvchan it coming from my reading of the code. If some of the facts
> are wrong it's only due to my ignorance about the subject.
> 

I'll try to fill in some of these points with my understanding of libvchan;
I have correspondingly less knowledge of V4V, so I may be wrong in assumptions
there.

> 1. Why V4V?
> 
> About the time when we started XenClient (3 year ago) we were looking for a
> lightweight inter VM communication scheme. We started working on a system
> based on netchannel2 at the time called V2V (VM to VM). The system
> was very similar to what libvchan is today, and we started to hit some
> roadblocks:
> 
>     - The setup relied on a broker in dom0 to prepare the xenstore node
>       permissions when a guest wanted to create a new connection. The code
>       to do this setup was a single point of failure. If the
>       broker was down you could create any more connections.

libvchan avoids this by allowing the application to determine the xenstore
path and adjusts permissions itself; the path /local/domain/N/data is
suitable for a libvchan server in domain N to create the nodes in question.

>     - Symmetric communications were a nightmare. Take the case where A is a
>       backend for B and B is a backend for A. If one of the domain crash the
>       other one couldn't be destroyed because it has some paged mapped from
>       the dead domain. This specific issue is probably fixed today.

This is mostly taken care of by improvements in the hypervisor's handling of
grant mappings. If one domain holds grant mappings open, the domain whose
grants are held can't be fully destroyed, but if both domains are being
destroyed then cycles of grant mappings won't stop them from going away.
 
> Some of the downsides to using the shared memory grant method:
>     - This method imposes an implicit ordering on domain destruction.
>       When this ordering is not honored the grantor domain cannot shutdown
>       while the grantee still holds references. In the extreme case where
>       the grantee domain hangs or crashes without releasing it granted
>       pages, both domains can end up hung and unstoppable (the DEADBEEF
>       issue).

This is fixed on current hypervisors.

>     - You can't trust any ring structures because the entire set of pages
>       that are granted are available to be written by the either guest.

This is not a problem: the rings are only used to communicate between the
guests, so the worst that a guest can do is corrupt the data that it sends or
cause spurious events. Note that libvchan does copy some important state out
of the shared page (ring sizes) once at startup because unexpected changes to
these values could cause problems.

>     - The PV connect/disconnect state-machine is poorly implemented.
>       There's no trivial mechanism to synchronize disconnecting/reconnecting
>       and dom0 must also allow the two domains to see parts of xenstore
>       belonging to the other domain in the process.

No interaction from dom0 is required to allow two domUs to communicate using
xenstore (assuming the standard xenstored; more restrictive xenstored
daemons may add such restrictions, intended to be used in conjunction with XSM
policies preventing direct communication via event channels/grants). The
connection state machine is weak; an external mechanism (perhaps the standard
xenbus "state" entry) could be used to coordinate this better in the user of
libvchan.

>     - Using the grant-ref model and having to map grant pages on each
>       transfer cause updates to V->P memory mappings and thus leads to
>       TLB misses and flushes (TLB flushes being expensive operations).

This mapping only happens once at the open of the channel, so this cost becomes
unimportant for a long-running channel. The cost is far higher for HVM domains
that use PCI devices since the grant mapping causes an IOMMU flush.

> 
> After a lot time spent trying to make the V2V solution work the way we
> wanted,  we decided that we should look at a new design that wouldn't have
> the issues mentioned above. At this point we started to work on V4V (V2V
> version 2).
> 
> 2. What is V4V?
> 
> One of the fundamental problem about V2V was that it didn't implement a
> connection mechanism. If one end of the ring disappeared you had to hope
> that you would received the xenstore watch that will sort everything out.
> 
> V4V is a inter-domain communication that supports 1 to many connections.
> All the communications from a domain (even dom0) to another domain goes
> through Xen and Xen forward the packet with a memory copies.
> 
> Here are some of the reasons why we think v4v is a good solution for
> inter-domain communication.
> 
> Reasons why the V4V method is quite good even though it does memory copies:
>     - Memory transfer speeds through the FSB in modern chipsets is quite
>       fast. Speeds on the order of 10-12 Gb/s (over say 2 DRAM channels)
>       can be realized.
>     - Transfers on a single clock cycle using SSE(2)(3) instructions allow
>       moving up to 128 bits at a time.
>     - Locality of reference arguments with respect to processor caches
>       imply even more speed-up due to likely cache hits (this may in fact
>       make the most difference in mem copy speed).

As I understand it, the same copies happen in libvchan: one copy from the
source to a buffer (hypervisor buffer in V4V, shared pages in libvchan) and
one copy out of the buffer to the destination. Both of these copies should
be able to take advantage of processor caches assuming reasonable locality
of data use and send/recv calls.

>     - V4V provides much better domain isolation since one domain's memory
>       is never seen by another and the hypervisor (a trusted component)
>       brokers all interactions. This also implies that the structure of
>       the ring can be trusted.

This is important for multicast, which libvchan does not support; it is not
as important for unicast.

>     - Use of V4V obviates the event channel depletion issue since
>       it doesn't consume individual channel bits when using VIRQs.

This is a significant advantage of V4V if channels are used in place of
local network communications as opposed to device drivers or other
OS-level components.

>     - The projected overhead of VMEXITs (that was originally cited as a
>       majorly limiting factor) did not manifest itself as an issue. In
>       fact, it can be seen that in the worst case V4V is not causing
>       many more VMEXITs than the shared memory grant method and in
>       general is at parity with the existing method.
>     - The implementation specifics of V4V make its use in both a Windows
>       and a Unix/Linux type OS's very simple and natural (ReadFile/WriteFile
>       and sockets respectively). In addition, V4V uses TCP/IP protocol
>       semantics which are widely understood and it does not introduce an
>       entire new protocol set that must be learned.
>     - V4V comes with a userspace library that can be use to interpose
>       the standard userspace socket layer. That mean that *any* network
>       program can be "V4Ved" *without* behing recompiled.
>       In fact we tried it on many program suchs as ssh, midori,
>       dbus (TCP-IP), X11.
>       This is possible because the underlying V4V protocol implement
>       a V4V sementic and supports connection. Suchs feature will be
>       really really hard to implement over the top of the current
>       libvchan implementation.
> 
> 3. V4V compared to libvchan
> 
> I've done some benchmarks on V4V and libchan and the results were
> pretty close between the the two if you use the same buffer size in both
> cases.
> 

[followup from Stefano's replies]
I would not expect much difference even on a NUMA system, assuming each domU
is fully contained within a NUMA node: one of the two copies made by libvchan
will be confined to a single node, while the other copy will be cross-node.
With domUs not properly confined to nodes, the hypervisor might be able to do
better in cases where libvchan would have required two inter-node copies.

> 
> In conclusion, this is not an attempt to demonstrate that V4V is superior to
> libvchan. Rather it is an attempt to illustrate that they can coexist in the
> Xen ecosystem, helping to solve different sets of problems.
> 
> Thanks,
> Jean
> 

-- 
Daniel De Graaf
National Security Agency

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.