Xen project Mailing List

[Xen-devel] Netchannel2 low-level ring protocol

From: Steven Smith <steven.smith@xxxxxxxxxxxxx>

Date: Mon, 1 Dec 2008 11:11:01 +0000

Cc: keir.fraser@xxxxxxxxxx, ian.pratt@xxxxxxxxxx, joserenato.santos@xxxxxx

Delivery-date: Mon, 01 Dec 2008 03:12:03 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

A couple of people have asked about using the low-level netchannel2 ring protocol in other device classes. For their benefit, here's a quick summary of how it works. The ring protocol itself is fairly straightforward. A ring pair consists of: -- A power-of-two set of consumer pages. -- A power-of-two set of producer pages. -- A single control page, which contains the ring indexes and so forth. -- An event channel port which is used for notifications. The consumer pages for one end of the ring will be the producer pages for the other end. All of the rings are allocated by the frontend-like end of the ring, and will be mapped into the backend-like end. For the master ring, the grant references and event channel references are communicated through xenstore in the usual way. Bypass rings are then negotiated over the master ring. A message transmitted on one of these rings consists of an 8-bit type, 8 bits of flags, and some bytes of payload. The entire message must be a multiple of eight bytes, and the payload should be padded to the appropriate size. It can wrap around the end of the ring, but a single message cannot be larger than the size of the ring. Each ring is a two-pointer ring buffer, with two event pointers in the same style as the existing netchannel1 ring pointers. An endpoint is expected to notify its peer over the event channel whenever it moves a data pointer over the matching event pointer (so if you move the producer pointer past the producer event pointer you need to notify your peer). rings.c contains a bunch of functions for manipulating the rings. The most important, if you're looking to use this ring protocol, are: Sending messages: -- nc2_send_message() -- This copies a message to the ring, but doesn't update the producer pointer. The message is therefore not available to the other end (so you can send a batch of messages and make them visible in one go, which makes scheduling a bit easier). -- nc2_flush_ring() -- Update the producer pointer following a batch of messages. All of the messages queued up with nc2_send_message() are made available to the other end. Returns 1 if you need to notify the remote, and 0 otherwise. -- nc2_can_send_payload_bytes() -- Check whether there's space on the producer ring to send a message of a given size. -- nc2_reserve_payload_bytes() -- Reserve space on the ring for a given message. Calling this is entirely optional, but it can sometimes be helpful to avoid getting an error at an inconvenient time. Receiving messages is a little bit more low-level. The user is expected to implement their own message loop in a tasklet or some such which copies incoming messages out of the ring and into local storage before processing them (for netchannel2, this is done in nc2_poll()). There are a couple of helpers for this: -- nc2_copy_from_ring() -- Copy bytes from the ring into a local buffer. -- nc2_final_check_for_messages() -- Check whether there are any unconsumed messages, and set a producer event so that we get notified when new ones are produced. -- nc2_finish_messages() -- Tell the other end we've finished with the messages it sent, and update the consumer pointers. Returns 1 if you need to notify the remote, or 0 otherwise. This could be combined with nc2_final_check_for_messages(), but splitting them out made things a bit clearer. The main downside of this scheme, compared to netchannel1-style rings, is that it's hard to guarantee that there's space on the ring for response messages. This means that, when a domain receives a message, it may need to buffer up the acknowledgement (assuming that the message requires an ACK). That in turn means that the number of outstanding messages of any given type needs to be bounded somehow (to avoid potentially requiring an infinite amount of memory). The higher-level protocol is responsible for doing this. (netchannel1 doesn't have this problem, because the space used for sending the message can later be used to send its response. This implicitly limits the number of outstanding messages to the size of the ring.) You also have to be a little bit careful that the frontend can't consume unbounded amounts of high-priority CPU in the backend (and vice-versa, if you're allowing untrusted backends, e.g. for bypass rings). There are two components to this: -- The interrupt needs to be disabled when you're not expecting it to fire, so that the peer can't just sit and spin calling notify_remote_via_irq() and make you spin in an interrupt handler. -- There needs to be some kind of rate-limiting on actual requests. If the protocol *only* exposes real hardware, it's likely that physical limitations will be sufficient. If it occasionally processes a request without going to the hardware (e.g. interdomain traffic, error cases) then you need to be a little bit careful. These problems aren't really specific to the ring protocol used (it's a general weakness of putting backends in domains rather than in the hypervisor), but they're still worth bearing in mind. Steven.

Attachment: signature.asc
Description: Digital signature

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.