[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Error handling in Mirage - request for comments!



On 4 February 2015 at 13:01, Leo White <lpw25@xxxxxxxxx> wrote:
> Hi all,
>
> As this discussion keeps on running, I thought I would add a couple of
> thoughts:
>
> - The problem with using exceptions for returning errors is that it
>   becomes very difficult to distinguish between a dynamic error that is
>   part of normal use (e.g. this network connection failed) from a
>   programming error (e.g. Not_found accidently leaked from a use of
>   List.find, an assert false triggering). This makes it hard to produce
>   the correct behaviour in all cases: there are applications for which
>   it would be better to kill everyting on detecting a programmer error
>   rather than risk continuing in an unstable state.

I don't think exception vs error code is a reliable way to divide
these cases up. Having your FS return `Block_error indicates a serious
problem that might require terminating the unikernel, whereas getting
a Division_by_zero exception from your HTTP handler is likely fairly
harmless. Whether an exception/error is serious depends more on the
importance of the thing that raised it.

Consider the case of a filesystem that reads a corrupted disk and
throws an exception (e.g. an assert fails). This is probably the most
extreme case where you'd want to abort. Should it terminate the
unikernel?

It depends what the disk is being used for. If it's the main hard-disk
then possibly. If it's some removable media the user has just inserted
then certainly not.

A good principle here is that a broken component should only be able
to harm itself. If a filesystem fails to handle a corrupted disk
correctly then it may further corrupt that disk, but it should not
abort the unikernel (and thus possibly corrupt other disks in the
middle of being written).

In this case, we can imagine a fail-safe FS functor that wraps all the
calls in the FS API so that if any one of them throws an unexpected
exception then it unplugs the underlying block device. No need to kill
everything.

> - The problem with using Lwt.t as your error monad is that it becomes
>   difficult to distiunguish between synchronous things that may return
>   errors, asynchronous things that may return errors, and asynchornous
>   things that should not return errors. It also seems tied up with the
>   exception mechanism, which leads to the same problem as my previous
>   point.
>
> Personally, I would probably suggest that all Mirage modules/module
> types include in their signatures:
>
>   type error
>
>   val pp_error : formatter -> error -> string
>
> For cases where an error can reasonably be matched on and handled
> specially, this should be exposed in the signature:
>
>   type error = private [> `Foo of foo | `Bar of bar]

Aha! I knew you'd know a trick to make this work!

However, different functions return different sets of errors. For
example, BLOCK.read shouldn't return `Is_read_only. Can we handle
that?

>   val pp_error : formatter -> error -> string

I've added some error_message (error -> string) functions, but this
might be better. I haven't used formatters much, so don't have an
opinion here.

But I think we still need an additional "exn_of_error" here because
whether something is a "dynamic" (expected) error or a bug changes as
the error is propagated.

For example, an XML parser probably regards a malformed document as a
dynamic error (`Malformed of malformed), which its caller may want to
handle. But if the caller is trying to load its configuration file
from a crunch FS, then malformed XML is a programming bug and should
be thrown as an exception.

> Some nice combinators should be provided for using ('a, 'b) Result.t and
> ('a, 'b) Result.t Lwt. and for lifting an ('a, Foo.error) Result.t into
> an ('a, Bar.error) Result.t.
>
> Exceptions that escape their intended scope, should always be treated as
> a programming error. The various "finally" functions for resources
> should catch and reraise them, so that they can reach the outermost
> scope which knows how best to deal with a programming error in the
> particular application. The same goes for binding on an Lwt thread which
> raised an exception or failed: it should cause an exception to be raised
> to reach the outermost level.

Yes, I think it does this.

> The aim here is still to take the erlang-style kill the component that
> failed and try again approach, but to ensure that there are two distinct
> return paths for regular errors and programming errors. The secondary
> aim is to have module signatures which give a clear indication of
> intended use.
>
> Regards,
>
> Leo



-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.