[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Netchannel2

After many delays and false starts, here's a preliminary netchannel2


These trees are up-to-date with the mainline trees of the same name as
of a couple of days ago.

Here are the main features of the current scheme:

-- Obviously, it implements a fully functional network interface.
   LRO, TSO, and checksum offload are all supported.  Hot-add and
   hot-remove work as expected.

-- The copy-to-receiver-buffers is now performed in the receiving
   domain, rather than in dom0.  This helps to prevent dom0 from
   becoming a bottleneck, and also has some cache locality advantages.

-- Inter-domain traffic can be configure to bypass dom0 completely.
   Once a bypass is established, the domains communicate on their own
   private ring, without indirecting via dom0.  This significantly
   increases inter-domain bandwidth, reduces latency, and reduces dom0

   (This is currently somewhat rough around the edges, and each bypass
   needs to be configured manually.  It'll (hopefully) eventually be
   automatic, but that hasn't been implemented yet.)

-- A new, and hopefully far more extensible, ring protocol, supporting
   variable size messages, multi-page rings, and out-of-order message
   return.  This is intended to make VMDQ support straightforward,
   although that hasn't been implemented yet.

-- Packet headers are sent in-line in the ring, rather than
   out-of-band in fragment descriptors.  Small packets (e.g. TCP ACKs)
   are sent entirely in-line.

-- There's an asymmetry limiter, intended to protect dom0 against
   denial of service attacks by malicious domUs.

-- Sub-page grant support.  The grant table interface is extended so a
   domain can grant another domain access to a range of bytes within a
   page, and Xen will then prevent the grantee domain accessing
   outside that range.  For obvious reasons, it isn't possible to map
   these grant references, and domains are expected to use the grant
   copy hypercalls instead.

-- Transitive grant support.  It's now possible for a domain to create
   a grant reference which indirects to another grant reference, so
   that any attempt to access the first grant reference will be
   redirected to the second one.  This is used to implement
   receiver-side copy on inter-domain traffic: rather than copying the
   packet in dom0, dom0 creates a transitive grant referencing the
   original transmit buffer, and passes that to the receiving domain.

   For implementation reasons, only a single level of transitive
   granting is supported, and transitive grants cannot be mapped
   (i.e. they can only be used in grant copy operations).  Multi-level
   transitive grants could be added pretty much as soon as anybody
   needs them, but mapping transitive grants would be more tricky.

It does still have a few rough edges:

-- Suspend/resume and migration don't work with dom0 bypass.

-- Ignoring the bypass support, performance isn't that much better
   than netchannel1 for many tests.  Dom0 CPU load is usually lower,
   so it should scale better when you have many NICs, but in terms of
   raw throughput there's not much in it either way.  Earlier versions
   were marginally ahead, but there seems to have been a bit of a
   regression while I was bringing it up to date with current

-- The hotplug scripts and tool integration aren't nearly as complete
   as their netchannel1 equivalents.  It's not clear to me how much of
   the netchannel1 stuff actually gets used, though, so I'm going to
   leave this as-is unless somebody complains.

-- The code quality needs some attention.  It's been hacked around by
   a number of people over the course of several months, and generally
   has a bit less conceptual integrity than I'd like in new code.

   (It's not horrific, by any means, but it is a bit harder to follow
   than the old netfront/netback drivers were.)

-- There's no unmodified-drivers support, so you won't be able to use
   it in HVM domains.  Adding support is unlikely to be terribly
   difficult, with the possible exception of the dom0 bypass
   functionality, but I've not looked at it at all yet.

If you want to try this out, you'll need to rebuild Xen, the dom0
kernel, and the domU kernels, in addition to building the module.
You'll also need to install xend and the userspace tools from the
netchannel2 xen-unstable repository.  To create an interface, either
use the ``xm network2-attach'' command or specify a vif2= list in your
xm config file.

The current implementation is broadly functional, in that it doesn't
have any known crippling bugs, but hasn't had a great deal of testing.
It should work, for the most part, but it certainly isn't ready for
production use.  If you find any problems, please report them.
Patches would be even better. :)

A couple of people have asked about using the basic ring protocol in
other PV device classes (e.g. pvSCSI, pvUSB).  I'll follow up in a
second with a summary of how all that works.


Attachment: signature.asc
Description: Digital signature

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.