[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Receiving Network Packets with Mirage/kFreeBSD



On 21 Aug 2012, at 16:27, PALI Gabor Janos <pgj@xxxxxxx> wrote:

> On Tue, Aug 21, 2012 at 03:21:19AM +0100, Anil Madhavapeddy wrote:
>> How about adding new CAML_BA_MANAGED flags specifically for mbufs and disk
>> buffers, so that we can increment the reference count and avoid copying
>> it?
> 
> I have managed to implement this.  I introduced a new management type,
> CAML_BA_MBUF for Bigarrays.  (It is not published at GitHub yet, though.)
> 
> enum caml_ba_managed {
>  CAML_BA_EXTERNAL = 0,        /* Data is not allocated by Caml */
>  CAML_BA_MANAGED = 0x200,     /* Data is allocated by Caml */
>  CAML_BA_MAPPED_FILE = 0x400, /* Data is a memory mapped file */
> #ifdef _KERNEL
>  CAML_BA_MBUF = 0x800,        /* Data is a FreeBSD mbuf(9) */
>  CAML_BA_MANAGED_MASK = 0xE00 /* Mask for "managed" bits in flags field */
> #else
>  CAML_BA_MANAGED_MASK = 0x600 /* Mask for "managed" bits in flags field */
> #endif
> };
> 
> This is then employed when an mbuf(9) is intercepted, so I worked around
> m_copydata().  Yay!

Superb! Do you have any insight into what the equivalent disk API is?

>> I think it's still useful to keep the existing contigmalloc code for other
>> future uses that need page-aligned code, but eliminating the mbuf data
>> copy is very important for performance.
> 
> All right.  So, now each of the buffers are embedded into Bigarrays, that is,
> each page corresponds to an mbuf(9).  That is, as a side effect, when packets
> are received in multiple fragments, they are going to be mapped to multiple
> Io_pages.  But those fragments are part of a single object, and this relation
> is not expressed this way -- since the meta information stored in the mbuf(9)
> (the m_next pointer) does not appear on the page.  (Or, it could, but I would
> feel that overkill.)
> 
> Instead, perhaps it should be expressed by grouping the Io_pages, i.e. putting
> them into a common list by being part of the same packet or not.  Then the 
> lambda function in Netif.listen could receive a list of Io_pages of the same
> packet.  However, let me add it turns out that in practice only single-mbuf(9)
> packets arrive from the network card, so it is not a big problem at the
> moment, I just wanted to mention it.

Interesting problem. In Xen, all the fragments come in as completely separate
pages with no connection beyond the Netif interface signalling that they are
continued fragments from a previous entry in the buffer.

How about allocating them all as separate Io_pages and incrementing the mbuf
ref count several times?  Although, as you say, if our hardware doesn't
generate them for the moment, it's good enough to mark a TODO and revisit this
in the future.  The listen interface requires them to be coalesced at the moment
I think, as our cstruct extension assumes a contiguous buffer.

We could extend cstruct to detect the size of the fragments and index into the
correct Bigarray automatically.  What's the minimum fragment size in FreeBSD
these days?

> Furthermore, an interesting implementation detail is how the captured buffers
> are passed to the Mirage side in this case.  Currently I store all the native
> buffers coming from the card (via ether_input()) in a mutex-protected linked
> list in C, which is converted into an OCaml Io.page.t list via a C callback
> function.  Note that Bigarrays are added around buffers at this point only,
> while the list on the C side is emptied.
> 
> So packet processing hence becomes this technically:
> 
> (* get_mbufs : id -> Io_page.t list *)
> 
> let rx_poll ifc fn =
>  let mbufs = get_mbufs ifc.id in
>  Lwt_list.iter_s fn
>  mbufs

Sounds sensible!  Haris here in SRI is working on a simulator backend that will
exercise the *other* end of the stack (the TCP logic in particular), so this
is coming together quite nicely.

-anil




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.