[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Error handling in Mirage



Hi,


On Thu, Jul 10, 2014 at 11:47 AM, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
On 10 Jul 2014, at 10:01, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>
> I found it awkward too. What's the rationale behind Mirage's current
> error handling?

The basic problem with using exceptions is if they are constructed in
a functor, they have a different representation across applications of
that functor. ÂConsider:

exception Catch_me of string

module A(S:sig val name : string end) = struct
 exception Catch_me of string
 let boom () = raise (Catch_me S.name)
end

module X = A(struct let name = "X" end)
module Y = A(struct let name = "Y" end)

let _ =
 try Printexc.print X.boom () with _ -> ();
 try Printexc.print Y.boom () with _ -> ();
 try X.boom () with |X.Catch_me x -> print_endline "X.Catch_me";
 try X.boom () with |Catch_me x -> print_endline "Catch_me"

Running this gives us:

$ ocaml boom.ml
Uncaught exception: A(S).Catch_me("X")
Uncaught exception: A(S).Catch_me("Y")
X.Catch_me
Exception: A(S).Catch_me "X".

The first two statements show the exception printer representing the
exception as A(S), which is unhelpful when examining backtraces. This
can be fixed with by registering a custom exception printer.

The next two statements illustrate how you could go wrong if there
are many similarly named exceptions, since there is no exhaustivity
checking in the exception type. ÂTo write a generic function that
catches any Catch_me, I believe you need a `try ... with _ -> ()`

> Using polymorphic variants can be useful as it forces the user to
> handle each possible error case. But that only works if you keep false
> positives low.

The reasoning behind explicit polymorphic variants here was to permit
catching one class of error across all the device types:

module A(S:sig val name : string end) = struct
 let boom () = `Catch_me S.name
end

module X = A(struct let name = "X" end)
module Y = A(struct let name = "Y" end)

let catcher rfn fn a = fn a |> function `Catch_me r -> rfn r

let _ =
 X.boom () |> function `Catch_me r -> print_endline r;
 Y.boom () |> function `Catch_me r -> print_endline r;
 catcher print_endline X.boom ();
 catcher print_endline Y.boom ()

Here, the catcher function can be used across both functor
applications (or, in Mirage terms, device drivers).

However, writing more generic catcher functions that want
to eliminate just one possibility from the variants and
pass the others through is difficult, and leads to gems
such as this from the OCaml-TLS Mirage code:

type ('a, 'e, 'c) m =
 ([< `Ok of 'a | `Error of 'e | `Eof ] as 'c) Lwt.t

let (>>==) (a : ('a, 'e, _) m) (f : 'a -> ('b, 'e, _) m) : ('b, 'e, _) m =
 a >>= function
 | `Ok x        Â-> f x
 | `Error _ | `Eof as e -> return e


> An `Unknown can only be logged or displayed, so it would be best to
> handle it once in a catch block at the top level, not for every read.
> Making the payload be of type exn rather than string would allow code
> to handle individual cases if needed (e.g. a remote block device might
> return Network_error, indicating that the client should reconnect and
> try again).

> `Unimplemented and `Disconnected presumably indicate bugs. For most
> programs, it would only make sense to report them at the top level.
>
> `Is_read_only is irrelevant for read operations.
>
> I think it would be simpler if exceptions were used and the type
> signature for read was just:
>
> Âval read: t -> int64 -> page_aligned_buffer list -> unit io
>
> For write, handling `Is_read_only might make sense, but it still seems
> better to me to use an exception. After all, if the device is
> read-only then you can't get it into a bad state, nor perform any
> recovery on it.

>
>
> In summary, I propose:
>
> - We should separate errors the caller should usually check for vs
> bugs/internal errors.
>
> - Probably exceptions should be thrown in the second case. OCaml
> programs already have to handle exceptions anyway. If not, `Unknown
> should at least take an exn payload, not a string (print handlers
> should be registered for any exceptions, of course).
>
> - For cases the caller should check, each function should declare its
> own set of errors. DEVICE.read should not return `Is_read_only,
> FS.format should not return `File_already_exists, etc.


Yes, so by the above guidelines, we might turn this:

type error = [
Â| `Unknown of string (** an undiagnosed error *)
Â| `Unimplemented   (** operation not yet implemented in the code *)
Â| `Is_read_only   Â(** you cannot write to a read/only instance *)
Â| `Disconnected   Â(** the device has been previously disconnected *)
]

val read: t -> int64 -> page_aligned_buffer list -> [ `Error of error | `Ok of unit ] io
val write: t -> int64 -> page_aligned_buffer list -> [ `Error of error | `Ok of unit ] io

into:

exception Unknown of string with sexp
exception Unimplemented of string with sexp
exception Disconnected with sexp

type read_error Â= [ `Out_of_bounds ] with sexp
type write_error = [ `Read_only_device | read_error ] with sexp

val read : t -> int64 -> page_aligned_buffer list -> [ `Error of read_error | `Ok of unit ] io
val write: t -> int64 -> page_aligned_buffer list -> [ `Error of write_error | `Ok of unit ] io

This is also easier to contract, since a read-only device would simply skip
the definition of `write_error` and `write`. ÂWe may need to do some
more exception registration to tell us which module the _expression_ came
from (need to test that).

This sounds sensible to me.

When debugging one of these fatal errors (e.g. 'Disconnected' which means: someone called 'disconnect' and then tried 'read' or 'write' afterwards) it would be useful to have more context. For example it would probably be good to add the device id to the exception so we know which disk it relates to. This could help us infer that the buggy code is in the FS driver running on disk 2 and not the irmin backend on disk 1. There might be other useful context which is more implementation-specific. We could either just log this to the console or we could encode it up as an Sexp.t and attach that to the exception too?

Since our top-level function is in the auto generated code:

let () =

 OS.Main.run (join [t1 ()])


For every exception we declare in our interface, should we pre-create a default exception handler which pretty-prints it? This could also explain to the unlucky user that this is a bug and should be reported on the issue tracker?

BTW I've now forked the V1 interfaces into a V2, so feel free to propose concrete changes as pull requests!

Cheers,
Dave

>
> Thoughts? Apologies if this has all been discussed to death before.

Nope! If in doubt, raise an exception thread like this one :-)

> I've did find https://lists.cam.ac.uk/pipermail/cl-mirage/2012-March/msg00003.html,
> which notes that Async doesn't cope with exceptions. Presumably that
> could be wrapped away, though. It also suggests that there is some
> problem with Lwt and backtraces, but I'm not sure what the issue is
> there. These days, OCaml seems to keep track of when exceptions are
> reraised and indicates this in the backtrace.

Async's model is broadly along the same lines that you propose. ÂIt defines
a process tree of monitors that catch uncaught exceptions and define local
behaviour for those. ÂIn the Mirage world, our current flat device hierarchy
would turn into a DSL for generating these monitor trees in the autogenerated
code.

The issue with backtraces is that if you drop into the cooperative thread
scheduler between (>>=) binds, the sequence of operations that led to the
function call are thrown away and the backtraces look like ("select -> function")
instead of ("blocking1 -> select -> blocking2 -> select -> function").

Lwt has a syntax extension that registers the current backtrace with
the backtrace system. ÂI'm not sure that this is a big problem though,
as we tend not to rely on backtraces for anything important (instead, the
value that's thrown is more useful).

Thoughts from others welcome -- I like Thomas' proposal and think that it
finds the sweet spot between forcing fully generic handling of polymorphic
variants and throwing exceptions everywhere.

-anil
_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel



--
Dave Scott
_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.