[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Irmin API newbie questions

On 24 February 2015 at 13:37, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> On 24 February 2015 at 13:34, Nick Betteridge <buzzheavyyear@xxxxxxxxxxx> 
> wrote:
>>> Note: it's probably worth removing "Irmin inside the browser" from the
>>> Pioneer Projects list now. It's working fine (with HTML5 storage).
>>> Getting it to work is just a case of modifying Irmin to use Daniel
>>> BÃnzli's pure-OCaml implementation of SHA-1 and implementing a few JS
>>> helpers for blitting with bigarrays.
>> Is this viewable anywhere? It would be great to have this functionality
>> built into Irmin + extension.
> Yes, although I'm planning on doing more testing before upstreaming.
> However, for the curious the current versions can be found here:
> HTML 5 storage backend for Irmin:
> https://github.com/talex5/cuekeeper/blob/master/html_storage.ml

I've now extended this to support events, so if the user has the page
open in two tabs they stay in sync. However, there's still the problem
that you can't remove a watch, so once you start watching you have to
keep on doing so. Not a problem for my app (I just watch the master
branch) but might be a problem in general.

I've also now made my app do merges: every item displayed on the
screen records the Irmin commit from which it came. Usually this is
the latest version, but if you're e.g. editing a form then it's the
version when you started the edit. When you submit a change, it
commits against that revision and then merges the result back to
master. This means that if you make two different edits to something
in two tabs, the conflict will be detected instead of silently
discarding the first change. (Ideally I want to resolve the conflicts,
but for now it just discards the second change and reports the error
to the user).

I'm not sure I'm using the API correctly though. Here's my code for
monitoring the master branch and updating the display when it changes:


  let make ~on_update branch =
    let watch_tags = I.watch_tags (branch "Watch master") in
    get_head_commit branch >>= fun initial_head ->
    let mutex = Lwt_mutex.create () in
    match I.branch (branch "Get branch name") with
    | `Head _ -> failwith "Not a tag!"
    | `Tag branch_name ->
    let t = {
      head = initial_head;
      updated = Lwt_condition.create ();
    } in
    async (fun () ->
      on_update >>= fun on_update ->
      watch_tags |> Lwt_stream.iter_s (function
        | (n, Some _commit) when n = branch_name ->
            (* (ignore the commit ID in the update message; we want
the latest) *)
            Lwt_mutex.with_lock t.mutex (fun () ->
              I.head (branch "Get latest commit") >>= function
              | None -> Ck_utils.error "Branch '%s' has disappeared!"
              | Some head ->
                  if head <> t.head then (
                    t.head <- head;
                    on_update head >>= fun () ->
                    Lwt_condition.broadcast t.updated ();
                    return ()
                  ) else (
                    return ()
        | _ -> return ()
    return t

Some points you have to worry about here:

- Irmin makes it easy to read data directly from a branch (tag), but
this is usually unsafe, since a tag can update while you're reading
it. Therefore, I think I always have to first get the head commit,
then create a new store for that revision, then read from that. It
might be safer if a branch just provided a way to get a commit, and
reads had to be done from a fixed commit.

- I only need to be notified once after a series of changes. I discard
the commit in the watch notification and always read the latest.
Otherwise, I might update the GUI to show stale data. Since a merge
can add many commits at once, the stream API doesn't seem useful (it
still doesn't ensure you see each commit individually, if that's what
you wanted).

- The functions that update the store shouldn't return until the
update has been see by the model ("on_update" has been called). I
therefore create a mutex and a Lwt_condition so that the commit
functions can be sure their change has been seen.

- I was using React to track things, but its use of weakrefs caused a
lot of problems, so I've gone back to imperative code and callbacks
for now.

All the code for changing the model goes via "merge_to_master". This
takes a "base" argument (the commit which the user made the change
from), creates a view (temporary branch), runs "fn" to make the
requested change (which should always succeed), merges the temporary
branch back to master, and waits for the on_update to notice it:

  let merge_to_master t ~base ~msg fn =
    let path = I.Key.empty in
    R.make_view base >>= fun view ->
    fn view >>= fun result ->
    Lwt_mutex.with_lock t.mutex (fun () ->
      (* At this point, head cannot contain our commit because we
haven't made it yet,
       * and no updates can happen while we hold the lock. *)
      let old_head = t.head in
      let updated = Lwt_condition.wait t.updated in
      R.V.merge_path (t.branch msg) path view >>= function
      | `Conflict msg -> Ck_utils.error "Conflict during merge: %s
(discarding change)" msg
      | `Ok () ->
      I.head (t.branch "Get latest commit") >|= function
      | Some new_head when new_head <> old_head ->
          (* [updated] cannot have fired yet because we still hold the
lock. When it does
           * fire next, it must contain our update. It must fire soon,
as head has changed. *)
      | _ ->
          (* Our change had no effect, so don't wait for head to move.
           * Or, the branch was deleted - no point waiting then either. *)
          return ()
    ) >>= fun updated ->    (* Changes have been committed. *)
    updated >>= fun () ->   (* [on_update] has been called. *)
    return result

Does this approach seem sensible? I feel like the API perhaps wasn't
intended to be used this way...

I'm still worried about losing data due to races. JavaScript (I think)
ensures that access to local storage from different threads doesn't
get interleaved, but since Irmin uses Lwt we don't benefit from this.
I suspect that if two tabs try to merge at once then one's commit will
be silently discarded. Probably the back-end update function should
take the expected old value as an argument so it can abort if the
state has changed since then.

It would also be useful to get the commit object from the merge before
updating the branch. That way, I could try loading it as a sanity
check (and thus reject any update that would result in an unreadable
state). I guess I need to create another temporary branch, merge to
that, then force-update master to it. Seems like a lot of work,

Anyway, despite these minor issues, Irmin is working great so far!

Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.