[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] atomically writing persistent data in mirage



On 26 January 2016 at 03:43, Tim Cuthbertson <tim@xxxxxxxxxxx> wrote:
> Hi all,
>
> I've got a small HTTP service which runs both as a normal unix server
> which I've slowly been porting to work on mirage as well. One factor I
> haven't found a solution for yet is atomic FS operations.
>
> The unix version relies on Unix.rename being atomic - I don't have
> heavy storage needs, so I'm not worried about the performance of
> writing a new file then renaming over the old one atomically. To get
> things working with the existing code I implemented a crude `rename`
> for mirage's FS backend, but obviously it is not atomic:
>
> https://github.com/timbertson/passe/blob/a07ca8fe5a3a6fe0803df4765078749310c3df5c/src/mirage/unikernel.ml#L10
>
> I believe that irmin does various things atomically, but my needs are
> simple enough that converting my simple, flat-file storage to use
> irmin as a backend instead seems like a lot of unnecessary work (and I
> really don't need branching or history).
>
> Does anyone know of a simple way to do persistent writes to the FS in
> mirage that ensure atomicity of written data? Peraps folks with
> knowledge of how irmin achieves atomicity can let me in on the base FS
> operations that it uses to do that?

There are two separate things here:

1. Making operations appear atomic to the application (i.e. it will
never see a half-written file).

2. Ensuring that if the system crashes and is restarted it will return
to some state it had previously had.

(1) is pretty easy in a Unikernel since you can just make everything
go via your own wrapper, e.g. with a Lwt_mutex around it. (2) is
hard...

Irmin ensures (1), but relies on the FS layer for (2). On Unix, it
uses POSIX atomic rename. I don't think we have Irmin persistence
working on Xen yet (maybe someone hacked it up with FAT, but it
probably wasn't robust against crashes if so).

I believe the current plan is to get this finished and working:

  https://github.com/djs55/ocaml-btree

Not sure how close it is to being ready. Perhaps Dave can comment...

> Just thinking about it without any particular knowledge of xen or the
> block storage, it seems like I could get away with three files for
> each real file:
>
> <file>.ptr -> contents is just "a" or "b"
> <file>.a
> <file>.b
>
> Upon read, get the current "active" from <file>.ptr then read that.
> Upon write, get the current active from <file>.ptr, write to the
> _inactive_ file and then overwrite the byte in file.ptr to make it
> active. I'd protect file writes with a process-level lock (to make
> sure multiple writers don't conflict), which is sufficient for the
> mirage backend since there's no multi-process concerns.
>
> Would that work? Are single-byte writes guaranteed to be atomic in the
> FAT FS backend? Any better ideas?

I vaguely recall that the FAT code builds up a list of blocks to write
and passes them all together to a function. You could perhaps write
them to a journal partition first. I suspect that trying to build
anything reliable on top of FAT is a lost cause however...


-- 
Dr Thomas Leonard        http://roscidus.com/blog/
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.