[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Irmin API newbie questions



> I've now extended this to support events, so if the user has the page
> open in two tabs they stay in sync. However, there's still the problem
> that you can't remove a watch, so once you start watching you have to
> keep on doing so. Not a problem for my app (I just watch the master
> branch) but might be a problem in general.

Currently, if you do not use an event stream anymore, the stream reader should 
be GCed automatically which will cause the stream writer (Irmin) to be clean-up 
on the next write. However I'm not totally sure the readers are properly GCed, 
so I need to check that before the release[1]. Did you notice some leaks there? 
But yes, you as don't control when the GC happen it might be better to leave 
the watch deallocation to the user. I need to think about that.

[1] https://github.com/mirage/irmin/issues/124

> I've also now made my app do merges: every item displayed on the
> screen records the Irmin commit from which it came. Usually this is
> the latest version, but if you're e.g. editing a form then it's the
> version when you started the edit. When you submit a change, it
> commits against that revision and then merges the result back to
> master. This means that if you make two different edits to something
> in two tabs, the conflict will be detected instead of silently
> discarding the first change. (Ideally I want to resolve the conflicts,
> but for now it just discards the second change and reports the error
> to the user).
> 
> I'm not sure I'm using the API correctly though. Here's my code for
> monitoring the master branch and updating the display when it changes:
> 
> https://github.com/talex5/cuekeeper/blob/83a5e3c146f2406b8491b1c9915cb7fef6753869/ck_update.ml#L30
> 
>  let make ~on_update branch =
>    let watch_tags = I.watch_tags (branch "Watch master") in
>    get_head_commit branch >>= fun initial_head ->
>    let mutex = Lwt_mutex.create () in
>    match I.branch (branch "Get branch name") with
>    | `Head _ -> failwith "Not a tag!"
>    | `Tag branch_name ->
>    let t = {
>      branch;
>      head = initial_head;
>      updated = Lwt_condition.create ();
>      mutex;
>    } in
>    async (fun () ->
>      on_update >>= fun on_update ->
>      watch_tags |> Lwt_stream.iter_s (function
>        | (n, Some _commit) when n = branch_name ->
>            (* (ignore the commit ID in the update message; we want
> the latest) *)
>            Lwt_mutex.with_lock t.mutex (fun () ->
>              I.head (branch "Get latest commit") >>= function
>              | None -> Ck_utils.error "Branch '%s' has disappeared!"
> branch_name
>              | Some head ->
>                  if head <> t.head then (
>                    t.head <- head;
>                    on_update head >>= fun () ->
>                    Lwt_condition.broadcast t.updated ();
>                    return ()
>                  ) else (
>                    return ()
>                  )
>            )
>        | _ -> return ()
>      )
>    );
>    return t

Hum, this clearly demonstrates that I need to improve the watch API ... I 
completely agree that getting the initial head is important: we need something 
similar to test_and_set at this level I think. I'm happy to move away from Lwt 
streams and use a more conventional callback API, where it's easier to control 
the scheduling (ie. hide the stream and expose only your "make" function 
directly in Irmin). I need to check with Dave what he needs exactly for 
Xenstore watches...

> Some points you have to worry about here:
> 
> - Irmin makes it easy to read data directly from a branch (tag), but
> this is usually unsafe, since a tag can update while you're reading
> it. Therefore, I think I always have to first get the head commit,
> then create a new store for that revision, then read from that. It
> might be safer if a branch just provided a way to get a commit, and
> reads had to be done from a fixed commit.

Hum, reading a tag should not be unsafe as the RW backend should ensure that 
updates are atomic. See Get_unix.with_write_file[2] for example where we use 
the fact that "rename" is the only operation guaranteed to be atomic on POSIX 
filesystem. Not sure which guarantee do you have for HTML5 local storage though 
... I'm afraid you don't have many[3].

[2] https://github.com/mirage/ocaml-git/blob/master/lib/unix/git_unix.ml#L300
[3] 
http://balpha.de/2012/03/javascript-concurrency-and-locking-the-html5-localstorage/

> - I only need to be notified once after a series of changes. I discard
> the commit in the watch notification and always read the latest.
> Otherwise, I might update the GUI to show stale data. Since a merge
> can add many commits at once, the stream API doesn't seem useful (it
> still doesn't ensure you see each commit individually, if that's what
> you wanted).

In some cases this might be useful (if you store incremental updates that you 
want to replay) but I agree that this is not terribly useful in the main case.

> - The functions that update the store shouldn't return until the
> update has been see by the model ("on_update" has been called). I
> therefore create a mutex and a Lwt_condition so that the commit
> functions can be sure their change has been seen.

agreed and I think that's a good property to move into the watch API.

> All the code for changing the model goes via "merge_to_master". This
> takes a "base" argument (the commit which the user made the change
> from), creates a view (temporary branch), runs "fn" to make the
> requested change (which should always succeed), merges the temporary
> branch back to master, and waits for the on_update to notice it:
> 
>  let merge_to_master t ~base ~msg fn =
>    let path = I.Key.empty in
>    R.make_view base >>= fun view ->
>    fn view >>= fun result ->
>    Lwt_mutex.with_lock t.mutex (fun () ->
>      (* At this point, head cannot contain our commit because we
> haven't made it yet,
>       * and no updates can happen while we hold the lock. *)
>      let old_head = t.head in
>      let updated = Lwt_condition.wait t.updated in
>      R.V.merge_path (t.branch msg) path view >>= function
>      | `Conflict msg -> Ck_utils.error "Conflict during merge: %s
> (discarding change)" msg
>      | `Ok () ->
>      I.head (t.branch "Get latest commit") >|= function
>      | Some new_head when new_head <> old_head ->
>          (* [updated] cannot have fired yet because we still hold the
> lock. When it does
>           * fire next, it must contain our update. It must fire soon,
> as head has changed. *)
>          updated
>      | _ ->
>          (* Our change had no effect, so don't wait for head to move.
>           * Or, the branch was deleted - no point waiting then either. *)
>          return ()
>    ) >>= fun updated ->    (* Changes have been committed. *)
>    updated >>= fun () ->   (* [on_update] has been called. *)
>    return result
> 
> Does this approach seem sensible? I feel like the API perhaps wasn't
> intended to be used this way...

I can see why you are doing all of this, and this is certainly not optimal for 
an API point of view. One think I can do to help is to add an invariant to all 
the high-level updates (such as merge) functions to only perform the whole 
update only if the head didn't change since the beginning of the operations. 
It's an easy check to add and it will help a lot to reason on what happens 
during concurrent updates. The only reason I didn't do it before is that I was 
worried about starvation issues and other concurrency scheduling related issues 
- but I guess they are better than data loss ... and we could just picky bag on 
Lwt_mutex / Lwt_condition properties.

[4] https://github.com/mirage/irmin/issues/157

> I'm still worried about losing data due to races. JavaScript (I think)
> ensures that access to local storage from different threads doesn't
> get interleaved, but since Irmin uses Lwt we don't benefit from this.
> I suspect that if two tabs try to merge at once then one's commit will
> be silently discarded. Probably the back-end update function should
> take the expected old value as an argument so it can abort if the
> state has changed since then.

Not sure to understand what Lwt exactly changes there. I'm happy to modify the 
signature of the backends to add more checks, but modifying the "RW.update" 
signature to take an optional current value to test would mean modifying the 
"Irmin.update" signature as well. Not sure that's exactly what we want. Could 
add a new "RW.test_and_set" function though.[4] I'll think about it.

[5] https://github.com/mirage/irmin/issues/155

> It would also be useful to get the commit object from the merge before
> updating the branch. That way, I could try loading it as a sanity
> check (and thus reject any update that would result in an unreadable
> state). I guess I need to create another temporary branch, merge to
> that, then force-update master to it. Seems like a lot of work,
> though.

I think it's easy to add a "try_merge" function which do not update the head. 
The current way to do that is indeed to create a temporary branch.

[6] https://github.com/mirage/irmin/issues/156

> Anyway, despite these minor issues, Irmin is working great so far!

As usual, thanks for the very useful feedback! That'll definitely help 
improving the watch API and the overhaul usability.

Thomas


_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.