Xen project Mailing List

Re: [MirageOS-devel] Irmin API evolution

To: Thomas Gazagnaire <thomas@xxxxxxxxxxxxxx>

From: Thomas Leonard <talex5@xxxxxxxxx>

Date: Mon, 10 Aug 2015 15:05:35 +0100

Cc: mirageos-devel <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 10 Aug 2015 14:05:58 +0000

List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

On 10 August 2015 at 13:51, Thomas Gazagnaire <thomas@xxxxxxxxxxxxxx> wrote: > Hi all, > > There are still parts in the Irmin API that I am not very happy about, so I > send an email to get feedback from all the early users to check if they share > my views. > > 1. The functor / first-class module interfaces: the first-class module is > simpler to grasp, it is harder (or sometimes impossible) to do more complex > tasks with them. A lot of people have reported to me that it's confusing and > that it's bad to duplicate the API. I agree and I propose to remove the > first-class module API all together and add one or two default higher-level > functors to simplify a little bit the current functor API. Yes, I think would simplify things. I don't recall using any of the first-class modules API in my own code. > 2. The View API is mutable, although it would be better to make it > immutable[1]. Views keeps a list of operations which are been applied to it > (such as reads) so merging a view is similar to committing a transaction in > usual DB context. I think it is hard to keep the same behaviour on immutable > views without being too verbose or confusing. So I propose to rename View to > Transaction and have a proper immutable Staging area. I'm not bothered about this either way. Views (staging areas) aren't shared, so being mutable is much less of a problem here than for branch stores. Though perhaps we need a different concept here. In my own code, I just want a way to generate a new tree from an existing one, without committing lots of temporary intermediate trees to the blob store while I'm doing so. e.g. I want something like this: let staging = Commit.checkout c1 in Staging.update path1 value1; Staging.update path2 value2; let c2 = Commit.create ~parents:[c1] staging in Merging this new commit onto a branch is a separate operation for me. It's not clear to me that this needs to be stored as a list of changes. > 3. The terminology is slightly different that the usual Git one. I think it > is better to change it to match exactly to help people already knowing Git to > start with Irmin more easily (eg. Irmin's tag are Git's references, Irmin's > heads are Git's commits, etc) Yes please! "tag" (mutable reference) is especially confusing because it means the exact opposite in Git (immutable reference). > 4. The Irmin API conflates the Git repository configuration and branch state > into an Irmin store handler. For some operations (as listing all the branches > in a repository) it doesn't make really sense. Maybe it's too confusing and > you don't want that. See Cuekeeper'API[2] which exposes a different API, > closer to what Git offers. No surprise that I agree with this ;-) Another reason to make an explicit Repository.t: Irmin currently makes some "global" state when you apply the functors. For example, I have to re-apply the Irmin.Basic functor to Irmin_mem.Make every time I want a fresh in-memory store. Having a Repository.t lets you do the set up operations for a repository once, not once per branch (too often) or once per functor application (surprising hidden state). > Also, Irmin offers 3 ways to create a new store handler: using the master > branch, using a given branch name, or using a commit id. Only using a commit > id is fully "concurrency-safe" unless you know exactly what you are doing -- > this is not very clear in the API and can be surprising to users. We should > change that. > > So in the very short-term (read this week or the next) I plan to release a > 0.10 API with at least: > - deletion of the 1st class module API (address 1.) > - some renaming of modules (at least Tag -> Reference, View -> Transaction, > type head -> type commit, etc...) I think Irmin's "tag" corresponds to Git's "branch name". Reference is any way to refer to a commit (by commit ID, by named branch, etc). A couple more that would have saved me some confusion: Node -> Tree Slice -> Bundle Task -> Message? Commit_metadata? Log_entry? It's probably clearer to have separate types "Commit.t" and "Commit.id" too. > - addition of an immutable staging area (address 3.) > - and something (not clear yet) to address 4. > > I am very interested to get feedback on this, so please reply if you are or > were (or will be) using Irmin at one point ("please don't change the API > again" is also a good reply :p) > > Best, > Thomas > > [1]: https://github.com/mirage/irmin/issues/109 > [2]: https://github.com/talex5/cuekeeper/blob/master/utils/git_storage_s.ml A few extra thoughts: First, Irmin seems to be trying to do two things: 1. Provide a Git-like API that isn't tied to ocaml-git, and that provides higher-level functions such as "merge" and "find-common-ancestor". 2. Use Git to implement a mutable key-value store (by making a commit each time you update something). I feel that the key-value store is just one possible use of Irmin and should be an optional component built on top of it. For CueKeeper at least, I found it easier to think in terms of immutable trees and commits rather than mutable key-value stores. CueKeeper's Irmin wrapper is really more of an "unwrapper". Secondly, I think the current API makes it too easy to introduce races. The simplest way to use Irmin is roughly: 1. Create a store from a branch name. 2. Read and update the store. This will appear to work during testing, but once you have multiple threads/processes/tabs accessing the same store you start to get strange behaviour. For example: let store = Store.create ... in let v1 = Store.read key1 + amount in let v2 = Store.read key2 - amount in Store.update key1 v1; Store.update key2 v2; You'd expect this to transfer "amount" from key2 to key1, but the operations will be interleaved with other changes to the store. Therefore, I'd like to force the user to decide what to do here. In particular, the user should be forced to dereference the branch to get a commit before they can read anything from it. Or, the user could create a transaction on the branch. But accessing the branch directly shouldn't be easy. This would also remove some ugliness from the BC module type. e.g val tag : t -> tag option Lwt.t (** tag t is t's name. Return None if t is not persistent. *) Here, the API tries to cover both commits and branches using an option type. With separate types, you just have: module Branch : sig val name : t -> branch_name Lwt.t ... Likewise, val head : t -> head option Lwt.t wouldn't need to be an option, because a commit always has an ID. Anyway, thanks for looking into this! Hopefully there's something of use in the above... -- Dr Thomas Leonard http://roscidus.com/blog/ GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.