[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Netchannel2 low-level ring protocol

A couple of people have asked about using the low-level netchannel2
ring protocol in other device classes.  For their benefit, here's a
quick summary of how it works.

The ring protocol itself is fairly straightforward.  A ring pair
consists of:

-- A power-of-two set of consumer pages.
-- A power-of-two set of producer pages.
-- A single control page, which contains the ring indexes and so
-- An event channel port which is used for notifications.

The consumer pages for one end of the ring will be the producer pages
for the other end.  All of the rings are allocated by the
frontend-like end of the ring, and will be mapped into the
backend-like end.  For the master ring, the grant references and event
channel references are communicated through xenstore in the usual way.
Bypass rings are then negotiated over the master ring.

A message transmitted on one of these rings consists of an 8-bit type,
8 bits of flags, and some bytes of payload.  The entire message must
be a multiple of eight bytes, and the payload should be padded to the
appropriate size.  It can wrap around the end of the ring, but a
single message cannot be larger than the size of the ring.

Each ring is a two-pointer ring buffer, with two event pointers in the
same style as the existing netchannel1 ring pointers.  An endpoint is
expected to notify its peer over the event channel whenever it moves a
data pointer over the matching event pointer (so if you move the
producer pointer past the producer event pointer you need to notify
your peer).

rings.c contains a bunch of functions for manipulating the rings.  The
most important, if you're looking to use this ring protocol, are:

Sending messages:

-- nc2_send_message() -- This copies a message to the ring, but
   doesn't update the producer pointer.  The message is therefore not
   available to the other end (so you can send a batch of messages and
   make them visible in one go, which makes scheduling a bit easier).

-- nc2_flush_ring() -- Update the producer pointer following a batch
   of messages.  All of the messages queued up with nc2_send_message()
   are made available to the other end.  Returns 1 if you need to
   notify the remote, and 0 otherwise.

-- nc2_can_send_payload_bytes() -- Check whether there's space on the
   producer ring to send a message of a given size.

-- nc2_reserve_payload_bytes() -- Reserve space on the ring for a
   given message.  Calling this is entirely optional, but it can
   sometimes be helpful to avoid getting an error at an inconvenient

Receiving messages is a little bit more low-level.  The user is
expected to implement their own message loop in a tasklet or some such
which copies incoming messages out of the ring and into local storage
before processing them (for netchannel2, this is done in nc2_poll()).
There are a couple of helpers for this:

-- nc2_copy_from_ring() -- Copy bytes from the ring into a local

-- nc2_final_check_for_messages() -- Check whether there are any
   unconsumed messages, and set a producer event so that we get
   notified when new ones are produced.

-- nc2_finish_messages() -- Tell the other end we've finished with the
   messages it sent, and update the consumer pointers.  Returns 1 if
   you need to notify the remote, or 0 otherwise.  This could be
   combined with nc2_final_check_for_messages(), but splitting them
   out made things a bit clearer.

The main downside of this scheme, compared to netchannel1-style rings,
is that it's hard to guarantee that there's space on the ring for
response messages.  This means that, when a domain receives a message,
it may need to buffer up the acknowledgement (assuming that the
message requires an ACK).  That in turn means that the number of
outstanding messages of any given type needs to be bounded somehow (to
avoid potentially requiring an infinite amount of memory).  The
higher-level protocol is responsible for doing this.

(netchannel1 doesn't have this problem, because the space used for
sending the message can later be used to send its response.  This
implicitly limits the number of outstanding messages to the size of
the ring.)

You also have to be a little bit careful that the frontend can't
consume unbounded amounts of high-priority CPU in the backend (and
vice-versa, if you're allowing untrusted backends, e.g. for bypass
rings).  There are two components to this:

-- The interrupt needs to be disabled when you're not expecting it to
   fire, so that the peer can't just sit and spin calling
   notify_remote_via_irq() and make you spin in an interrupt handler.

-- There needs to be some kind of rate-limiting on actual requests.
   If the protocol *only* exposes real hardware, it's likely that
   physical limitations will be sufficient.  If it occasionally
   processes a request without going to the hardware (e.g. interdomain
   traffic, error cases) then you need to be a little bit careful.

These problems aren't really specific to the ring protocol used (it's
a general weakness of putting backends in domains rather than in the
hypervisor), but they're still worth bearing in mind.


Attachment: signature.asc
Description: Digital signature

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.