[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Irmin API evolution

On 10 August 2015 at 13:51, Thomas Gazagnaire <thomas@xxxxxxxxxxxxxx> wrote:
> Hi all,
> There are still parts in the Irmin API that I am not very happy about, so I 
> send an email to get feedback from all the early users to check if they share 
> my views.
> 1. The functor / first-class module interfaces: the first-class module is 
> simpler to grasp, it is harder (or sometimes impossible) to do more complex 
> tasks with them. A lot of people have reported to me that it's confusing and 
> that it's bad to duplicate the API. I agree and I propose to remove the 
> first-class module API all together and add one or two default higher-level 
> functors to simplify a little bit the current functor API.

Yes, I think would simplify things. I don't recall using any of the
first-class modules API in my own code.

> 2. The View API is mutable, although it would be better to make it 
> immutable[1]. Views keeps a list of operations which are been applied to it 
> (such as reads) so merging a view is similar to committing a transaction in 
> usual DB context. I think it is hard to keep the same behaviour on immutable 
> views without being too verbose or confusing. So I propose to rename View to 
> Transaction and have a proper immutable Staging area.

I'm not bothered about this either way. Views (staging areas) aren't
shared, so being mutable is much less of a problem here than for
branch stores.

Though perhaps we need a different concept here. In my own code, I
just want a way to generate a new tree from an existing one, without
committing lots of temporary intermediate trees to the blob store
while I'm doing so.

e.g. I want something like this:

let staging = Commit.checkout c1 in
Staging.update path1 value1;
Staging.update path2 value2;
let c2 = Commit.create ~parents:[c1] staging in

Merging this new commit onto a branch is a separate operation for me.
It's not clear to me that this needs to be stored as a list of

> 3. The terminology is slightly different that the usual Git one. I think it 
> is better to change it to match exactly to help people already knowing Git to 
> start with Irmin more easily (eg. Irmin's tag are Git's references, Irmin's 
> heads are Git's commits, etc)

Yes please! "tag" (mutable reference) is especially confusing because
it means the exact opposite in Git (immutable reference).

> 4. The Irmin API conflates the Git repository configuration and branch state 
> into an Irmin store handler. For some operations (as listing all the branches 
> in a repository) it doesn't make really sense. Maybe it's too confusing and 
> you don't want that. See Cuekeeper'API[2] which exposes a different API, 
> closer to what Git offers.

No surprise that I agree with this ;-)

Another reason to make an explicit Repository.t: Irmin currently makes
some "global" state when you apply the functors. For example, I have
to re-apply the Irmin.Basic functor to Irmin_mem.Make every time I
want a fresh in-memory store. Having a Repository.t lets you do the
set up operations for a repository once, not once per branch (too
often) or once per functor application (surprising hidden state).

> Also, Irmin offers 3 ways to create a new store handler: using the master 
> branch, using a given branch name, or using a commit id. Only using a commit 
> id is fully "concurrency-safe" unless you know exactly what you are doing -- 
> this is not very clear in the API and can be surprising to users. We should 
> change that.
> So in the very short-term (read this week or the next) I plan to release a 
> 0.10 API with at least:
> - deletion of the 1st class module API (address 1.)
> - some renaming of modules (at least Tag -> Reference, View -> Transaction, 
> type head -> type commit, etc...)

I think Irmin's "tag" corresponds to Git's "branch name". Reference is
any way to refer to a commit (by commit ID, by named branch, etc).

A couple more that would have saved me some confusion:

Node -> Tree
Slice -> Bundle
Task -> Message? Commit_metadata? Log_entry?

It's probably clearer to have separate types "Commit.t" and "Commit.id" too.

> - addition of an immutable staging area (address 3.)
> - and something (not clear yet) to address 4.
> I am very interested to get feedback on this, so please reply if you are or 
> were (or will be) using Irmin at one point ("please don't change the API 
> again" is also a good reply :p)
> Best,
> Thomas
> [1]: https://github.com/mirage/irmin/issues/109
> [2]: https://github.com/talex5/cuekeeper/blob/master/utils/git_storage_s.ml

A few extra thoughts:

First, Irmin seems to be trying to do two things:

1. Provide a Git-like API that isn't tied to ocaml-git, and that
provides higher-level functions such as "merge" and

2. Use Git to implement a mutable key-value store (by making a commit
each time you update something).

I feel that the key-value store is just one possible use of Irmin and
should be an optional component built on top of it. For CueKeeper at
least, I found it easier to think in terms of immutable trees and
commits rather than mutable key-value stores. CueKeeper's Irmin
wrapper is really more of an "unwrapper".

Secondly, I think the current API makes it too easy to introduce
races. The simplest way to use Irmin is roughly:

1. Create a store from a branch name.
2. Read and update the store.

This will appear to work during testing, but once you have multiple
threads/processes/tabs accessing the same store you start to get
strange behaviour. For example:

let store = Store.create ... in
let v1 = Store.read key1 + amount in
let v2 = Store.read key2 - amount in
Store.update key1 v1;
Store.update key2 v2;

You'd expect this to transfer "amount" from key2 to key1, but the
operations will be interleaved with other changes to the store.
Therefore, I'd like to force the user to decide what to do here. In
particular, the user should be forced to dereference the branch to get
a commit before they can read anything from it. Or, the user could
create a transaction on the branch. But accessing the branch directly
shouldn't be easy.

This would also remove some ugliness from the BC module type. e.g

val tag : t -> tag option Lwt.t
(** tag t is t's name. Return None if t is not persistent. *)

Here, the API tries to cover both commits and branches using an option
type. With separate types, you just have:

module Branch : sig
  val name : t -> branch_name Lwt.t


  val head : t -> head option Lwt.t

wouldn't need to be an option, because a commit always has an ID.

Anyway, thanks for looking into this! Hopefully there's something of
use in the above...

Dr Thomas Leonard        http://roscidus.com/blog/
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.