[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MirageOS-devel] test, quality, tcpip

Hi folks,

Mort's writeup of notes from a conversation on testing raises some questions on how to actually go about testing `tcpip`. I'm starting a new thread to avoid potentially hijacking a more general conversation on testing with this rather more specific proposal, but most of what I'm discussing is quite related to problems and solutions enumerated in "Notes on Testing" from a few hours ago, and some item are reflected in ARs there. I thought I'd take a run at writing down what I've gotten out of corridor and backchannel conversations with many other Mirage-seekers around restructuring the lump of code we currently keep in the `tcpip` repository (probably the most-used set of `impl`s in Mirage).

TL;DR - We should have each protocol in `tcpip` expose serializing and deserializing APIs for external use, in addition to including its own test suite and having clearer documentation of expectations around data ingress and egress (as included in Mort's notes). I further propose that it may be useful to split each protocol into its own package for use in situations other than "a full ethernet/ip/tcp/udp/dhcp stack assembled by the mirage frontend tool".

We should also 1) figure out which of the error exposing/handling proposals won and 2) start using whichever one it is.

The long version:

What I've read, heard, and thought some reasonable goals are:
 * make it easier to test the code we have that does stuff with packets (and if we do a good job, we might find a more generally useful model)
 * write and automatically perform some tests on our packet-handling code
 * make it clearer when errors occur at low levels so higher-level applications can deal with them appropriately
 * make API contracts between layers clearer (what blocks?)
 * make it easier to reuse code across projects when that code needs to know stuff about packets

Code reuse might be a surprising thing to see here mixed in with quality and testing. Here's some stuff that's in at least two places I know of for "a network stack", i.e. /ethernet/arp-and-ipv4,6/icmp/udp/tcp:

* wire format definitions (i.e. Cstructs)
 - Wire_structs (from `tcpip`), the module for serializing and deserializing wire formats of static size via the autogenerated Cstruct interface, doesn't seem to be explicitly intended to be exposed in `tcpip`.
 - ocaml-packet (all in `Packet.ml`)
 - ocaml-pcap (each format in its own .ml)
* serializing/deserializing code
 - all over `tcpip`, usually inaccessible or side-effecting (notable exception: dhcp)
 - ocaml-packet . notably ocaml-packet currently depends on `tcpip` for `tcpip_checksum.ml` and required externs
 - ocaml-pcap 0.4.0 also has serializing and deserializing code per protocol, bundled with wire formats. ocaml-pcap doesn't currently duplicate many application-layer protocols, although a more featureful `ocaml-pcap` will likely want to either do that or make common reference to parsing code.
 - mirage-nat and mirage-mimic, because I needed more than Wire_structs but less than a full listener

Here's what *isn't* all over the place:
* state machines/side-effecting operations/protocol implementations
 - `tcpip` is the only library I know of that implements anything beyond a parser/printer for a protocol below the application layer, although I'm certainly one of the less knowledgeable people about non-Mirage OCaml codebases on the list
* automated test code for any of the aforementioned
 * ocaml-packet checks that deserializing a type and then serializing the result gets you the original type
 * `tcpip`: now with tests (TM) ...but it could use some more ;)
 * application-layer code is doing a better job, but that doesn't get it much if the code beneath is unreliable

Things that might help with both reuse and accessibility for testing:
* pull Wire_structs into a separate library and make its wider usability clear and stable, where appropriate
 - currently some things are treated as constant-size cstructs that shouldn't be, e.g. ipv4 packets which might have options and therefore be larger than Wire_structs expects; exposing these widely without richer options for serializing and deserializing might be counterproductive. Libraries that duplicate Wire_structs seem to all share this problem, so a common solution would likely be useful.

* pull per-protocol serializing and deserializing code into its own modules
 - improve and formalize error communication for invalid data, and express these in mirage-types module types
 - write tests for serializing and deserializing modules

* possibly pull the serializing/deserializing modules into their own package (packages?) on which `tcpip` depends (or maybe a separate library within `tcpip`?), for common use outside of `tcpip` itself

* attempt to improve and write tests on the code in `tcpip` that *isn't* digging around in or making packets, which should be logic for acting on write and read requests (at a low level)
 - improve and formalize (create?) error communication for bad state transitions, syntactically OK but semantically bogus packets, etc
 - possibly it makes sense to break this package up entirely into separate protocol implementations and rely on the `mirage` frontend tool (or user request, outside of the Mirage ecosystem) to manage dependencies between them - see below

Most of the above is focused on splitting up the code in `tcpip`; there are probably other ways to do it, and likely some of them make more sense than what I'm proposing. It seems to me that there are (at least) two reasonable ways to split up some of the code in `tcpip`:
* horizontally layerwise, i.e. all code to do with ARP is in one package (serializing/deserializing, state machine/side effects), same for ipv4, ipv6, etc
* vertically concernwise, i.e. all serializing/deserializing code is in the same place, all action-taking code (probably with some more sensible subdivision) is in the same place, etc
* some other way I haven't thought of that makes way more sense (your proposal here!)

A horizontal division is the only thing that really potentially gives any benefit for users that I can see - linking in only the layers we need makes for smaller binaries (at least until we have dead code elimination). It also seems to have less potential for external API breakage causing other repositories to become unbuildable (thinking of the case where e.g. I want to support a new DHCP option, which might require me to both expose a new variant type for DHCP.Option.t and write some handling code in what's currently dhcpv4_client.ml); in a regime where the serializing/deserializing code and the actual implementations which receive and send data are in separate packages, they need separate, coordinated releases. As a bonus, we already have module types in `mirage-types` for defining reasonable interfaces between these layers.

Thanks for your consideration.

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.