Xen project Mailing List

Re: [MirageOS-devel] Irmin GC

From: Thomas Gazagnaire <thomas@xxxxxxxxxxxxxx>

Date: Thu, 15 Oct 2015 15:16:47 +0100

Cc: "mirageos-devel@xxxxxxxxxxxxxxxxxxxx" <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 15 Oct 2015 14:17:01 +0000

List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

> Proposal: > > - Irmin should provide a smallish GC-safe core (BC?) that hides the > internal stores completely and provides an API that will not GC data > you're using. This API will distinguish between a "commit ID" (which > might or might not represent a commit in the repository) and a > "commit" (which refers to a commit in the repository and will prevent > it from being removed as long as you keep an OCaml reference to it). > > - The core will need to provide named branches, commits and mutable > indexes/staging-areas that act as GC roots. > > - Ir_view and Ir_sync should be implemented on top of this API, so > they don't have to worry about GC. Higher-level operations such as > merge should probably move to Ir_view too. I am not sure about the distinction between commit and commit_id. What does it mean in term of API? Do you duplicate every functionsI to take both kinds as argument? Also how the user decide when to create a commit or a commit_id? Persistent commit vs. non-persistent commit might make sense, but what happen if the parents of a persistent commit are not persistent: do they become persistent? Are they GC'ed as well? I fully agree about the GC-safe core and making Ir_view and Ir_sync use them. > Issues: > > - A "commit" should keep its contents (trees and blobs) from being > GC'd, but what about its parents? If we want to allow shallow clones, > we might need to allow for a commit's parents to be missing. Support for shallow clone is a needed feature I think for performance reasons but I think can be separated from GC issues (e.g. just assume that some pointers might be dangling in the block store). > - GC with remote HTTP stores could be tricky. For custom protocols, GC > can be linked to the TCP connection, but HTTP is often spread over > multiple connections. Probably OK for the high-level API, but we might > have to remove the low one (I'm not very familiar with this REST API, > and so might be confused). Yes, kill the low-level store if that's possible. I think its main use currently is for merges and to simplify the watch hooks. Also historically it was the first bits to be implemented as it requires little logic from the server: using the low-level API, the client is responsible to do everything (at the cost of multiple round-trips per high-level operations). The high-level API gives more work to the server, but it exposes less private things to the clients and is more efficient in terms of round-trips. > - If the user runs "git gc" manually on a Git-format store then all > bets are off, of course. Likewise if you have a store shared by > multiple processes. I think it is important to keep the multiple-process safe if possible. Could be as simple as the GC adding a lock file somewhere (which will stop the world). If we enforce having only one Irmin process running over a local store, the invariant should be checked carefully. Last missing issues: temporary objects stored in the block store but not yet related to GC roots: A. When you transform a staging area into a new commit (for instance in views, but also when you do a simple update): (A1) iterate first over all the new blobs and tree objects to serialise them in the block store and get their hash. (A2) create a commit object containing the new hash of the tree root, and serialise it in the block store to get the commit ID (A3) (optionally) update a branch reference to point to the new ID. My main concern with external GC is that before (A3) is done, objects saved in (A1) and (A2) are unsafe and can be deleted at any moment. B. When you merge commits: (B1) inductively merge blobs and tree objects, serialise them in the store to get their ID B2, B3: same as (A2) and (A3) Again, (B1), (B2) are unsafe. Thomas _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

Follow-Ups:

Re: [MirageOS-devel] Irmin GC
- From: Thomas Leonard

References:

[MirageOS-devel] Irmin GC
- From: Thomas Leonard

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.