[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Error handling in Mirage - request for comments!



On 30 January 2015 at 14:36, Daniel BÃnzli <daniel.buenzli@xxxxxxxxxxxx> wrote:
> "In theory, explicitly listing all possible errors and forcing callers to 
> consider them should result in more reliable code. In practice, it results in 
> callers ignoring all errors."
>
> This seems rather due to a lack of good, standarized, error handling 
> combinators in mirage itself.

Can you propose a solution? Perhaps add it to the list of options?

Every time I try, I find I end up duplicating Lwt's existing handling.

> I'm really not sure that moving to more exceptions is a good thing especially 
> since last time I had a look at it lwt's error handling and cancelation 
> model/algebra it felt quite broken to me.

What problems do you see in Lwt's error handling? I agree that Lwt
cancellation seems broken*, but I'm not sure that's relevant here.

> I would rather say that the system should not use exceptions at all except 
> for things that are not meant to be handled where the only action you take is 
> to basically to abort the unikernel.

I'd agree if that was "abort the current transaction (which may cover
the whole unikernel)". Realistically, there are always going to be
error conditions that result in exceptions that should not terminate
it (e.g. running out of memory serving requests should only abort some
requests, etc).

> In practice things like:

> "Assume all code can raise exceptions, and that this must not break 
> invariants or leak resources"
>
> end up being a huge pain to code with.

I probably shouldn't have listed this under "Proposal". I wasn't
intending to suggest any change here, but just to state the fact that
OCaml is a language with exceptions. It's interesting to see that two
people disagree with this already.

If I understand your position:

- Every exception raised MUST terminate the unikernel. This includes
out-of-memory, division-by-zero, int_of_string on an out-of-range int,
etc, in any code path. Aborting the operation (e.g. HTTP request) that
caused the problem, logging the exception and continuing is not an
option.

- It is therefore acceptable for a module to leak resources and/or
leave the system in an invalid state if it it receives an exception
from any code it calls.

While some people might want that, it seems unfair to force everyone
into this model. I don't think exception safety is much work in most
cases (I always try to do this in my own code, anyway).

> While it may make you feel the code is less involved in practice it is much 
> more harder to reason about and to code against it correctly compared to 
> using a well principled error monad. The latter will also be future proof to 
> any concurrency mechanism you may want to use in the future.

Isn't Lwt already an error monad? Can you define "well principled" here?

> Of course there may be a few cases were you want to be able to raise a 
> globally defined exception that is not meant to be handled like Out_of_memory 
> or in the block example you give. But I'm sure that you can count them on the 
> finger of one hand and predefine them somewhere in mirage.

Let's make this concrete. The ones currently in V1 that I'd use with
Lwt.return (rather than fail) would be:

NETWORK, ETHIF:
- `Disconnected

TCP:
- `Timeout, `Refused

FS:
- `Not_a_directory, `Is_a_directory, `Directory_not_empty,
`No_directory_entry, `File_already_exists, `No_space

KV_RO:
- Unknown_key

I would however replace all the network codes with a generic
(`Network_error of exn), where the exn might be e.g. a Refused
exception with more information about why it was refused. This makes
it easy for callers who don't care to handle them all at once (with
Lwt.fail), allows extra network errors to be added by implementations,
and allows attaching more details about the causes.

Likewise for the FS errors (`FS_user_error of exn).

The ones I'd use with Lwt.fail (i.e. turn into exceptions) would be:

FLOW: (error is abstract anyway)

BLOCK:
- `Unknown, `Unimplemented, `Is_read_only, `Disconnected

NETWORK, ETHIF, IP:
- `Unknown, `Unimplemented

UDP, TCP, STACKV4:
- `Unknown

FS:
- `Unknown_error, `Block_device

Since 'connect' is going away, those errors wouldn't be included
anyway, but I'd probably turn them into exceptions (Lwt.fail) too:

ENTROPY: `No_entropy_device
CONSOLE: `Invalid_console
FS: `Format_not_recognised

Do you disagree with these choices? In general, I want to use Lwt.fail
for anything the caller won't want to handle specially. I'm undecided
about BLOCK's `Is_read_only and `Disconnected. They would make sense
for removable media.

I just realised my text "Convert all modules to raise exceptions
rather than return error codes" sounds like I want to remove all error
codes. What I meant was, convert the all error codes we decide to
convert, as above.

Also, there are some exceptions I think should be error codes under
this scheme, such as ring's Shutdown (which always needs to be handled
because you need to resubmit to the new ring).

Regards,


* In particular, Lwt cancellation is simply ignored if the current
operation happens to be uncancellable. Also, allowing consumers to
cancel threads that other consumers may be waiting on seems wrong.


-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.