[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Irmin GC

On 16 October 2015 at 11:06, Thomas Gazagnaire <thomas@xxxxxxxxxxxxxx> wrote:
>>> I am not sure about the distinction between commit and commit_id. What does 
>>> it mean in term of API? Do you duplicate every functionsI to take both 
>>> kinds as argument?
>> No, you just have a single function:
>>    BC.Repo.commit_of_id: t -> commit_id -> commit option Lwt.t
>> If this returns None then the commit wasn't in the store. If it
>> returns Some commit then that commit will stay in the store as long as
>> you hold the commit value. Then e.g. "task_of_commit" don't need to
>> return an option*, because you know the task will still be there.
> I still don't fully understand your proposal I think.

> The `task_of_commit` returns an option because the current branch might be 
> empty (and not have any commit).

`task_of_commit{_id}` works on commits, not branches.

>Also, a store is potentially shared by multiple Irmin instances, some commits 
>might be persistent on some local stores: what is the semantics when you 
>pull/push between local stores? do the "persistency" property is propagated?

I think you're talking about a different design. In your design, if I
understand correctly, there are "persistent" and "non-persistent"
commits associated with a repository. Are non-persistent ones held
only in memory and written to disk only when they become part of the
history of a named branch? That could be useful, but forced updates
might cause trouble (since a persistent commit wouldn't always be so).

What I was proposing what that all commits are persisted on disk. The
distinction is that a "commit" is something you have in your store,
while a "commit id" identifies a commit that you might or might not
have. For example, it makes sense to ask "is $commit_id in my store?".
When exporting, you may want to refer to commit IDs that you know the
remote has, but which you might not have.

>> * (actually, task_of_commit_id currently throws an exception if the
>> commit isn't in the store, which isn't ideal)
>>> Also how the user decide when to create a commit or a commit_id? Persistent 
>>> commit vs. non-persistent commit might make sense, but what happen if the 
>>> parents of a persistent commit are not persistent: do they become 
>>> persistent? Are they GC'ed as well?
>> You could think of a commit_id as like a weak ref to a commit.
> ok, so do you propose that every time you make a commit ID persistent, all 
> its parents become persistent as well?

In my proposal, committing (e.g. a view/staging-area) writes a commit
to the disk and returns a "commit". You can pass this commit value to
other functions, knowing that it will remain available. However, if
you ask for its ID and then let OCaml GC the commit object, then Irmin
is free to remove the commit from the disk store too.

> If that's the case, why not simply to (1) do not distinguish between commit 
> and commit IDs (2) always consider commits as weak references but (3) 
> consider them persistent when they are put in a reference.

This is the current design, but it means that your commits can be GC'd
while you're still using them. e.g. in

View.make_head v task ~parents ~contents >>= fun commit ->
BC.update_head master commit

`update_head` may fail because the commit no longer exists (a Git GC
occurred between make_head and update_head).

> so then, we you pull/push you are sharing references as well, so it is clear 
> what is persistent or not. It's also fine if GC are run locally, because they 
> then know both of weak references (as they are local) and persistent commits 
> (just read the references).
>>> I think it is important to keep the multiple-process safe if possible. 
>>> Could be as simple as the GC adding a lock file somewhere (which will stop 
>>> the world). If we enforce having only one Irmin process running over a 
>>> local store, the invariant should be checked carefully.
>> I don't see how multi-process can ever work if you allow anonymous branches.
> I agree. Let's add an invariant that only one instance of Irmin should run on 
> a local store -- if we add this, few things in the code can be simplified 
> (for instance the file locking bits in Irmin_unix) but we check that the 
> invariant satisfied and fail to start a new instance otherwise.

However, I've realised my proposal doesn't work well for use in the browser:

- We can't stop people opening multiple tabs on the same page.
- Javascript only recently got support for weak references [1].


Dr Thomas Leonard        http://roscidus.com/blog/
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.