[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [MirageOS-devel] irmin storage overhead and dedup
I tried adding both with/without the view with the same result. Filename size is small - around 10 bytes. What do you think is the best backend format to use for emails? > On Dec 31, 2014, at 6:13 AM, Thomas Gazagnaire <thomas@xxxxxxxxxxxxxx> wrote: > >>> I looked at the metadata that gets created for every email message and itâs >>> small - less than 100 bytes. So I ran a simple test of appending 20,000 >>> unique 100 bytes ascii messages. I would have expected the repository size >>> to be on the order of a few megabytes, instead it was 4.7G. This is roughly >>> 234K overhead per 100 bytes message, which would be quite impractical for >>> the email storage with the metadata essentially exceeding the message >>> storage. >> >> Did you start from an empty repository? Would be interested to run your code >> locally to check what happens. > > Did you add the message sequentially or did you use a view? If you have 20k > commits (which are at least 4k), but you also have 20k different directories > (which contain 1, 2, 3, ... 20k files). What's the size of your filenames. > They are serialized in the directory metadata, so if you have 10 byte > filenames, you should indeed expect the last directory metadata to be around > 20k*10 (maybe a bit less as it is compressed). Anyway, the Git format is not > very flexible (as we want to keep compatibility with the Git format) but that > useful to understand what is not optimal to improve it in the custom backend. > > Thomas _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |