[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] irmin storage overhead and dedup

I tried adding both with/without the view with the same result. Filename size 
is small - around 10 bytes.
What do you think is the best backend format to use for emails?

> On Dec 31, 2014, at 6:13 AM, Thomas Gazagnaire <thomas@xxxxxxxxxxxxxx> wrote:
>>> I looked at the metadata that gets created for every email message and itâs 
>>> small - less than 100 bytes. So I ran a simple test of appending 20,000 
>>> unique 100 bytes ascii messages. I would have expected the repository size 
>>> to be on the order of a few megabytes, instead it was 4.7G. This is roughly 
>>> 234K overhead per 100 bytes message, which would be quite impractical for 
>>> the email storage with the metadata essentially exceeding the message 
>>> storage.
>> Did you start from an empty repository? Would be interested to run your code 
>> locally to check what happens. 
> Did you add the message sequentially or did you use a view? If you have 20k 
> commits (which are at least 4k), but you also have 20k different directories 
> (which contain 1, 2, 3, ... 20k files). What's the size of your filenames. 
> They are serialized in the directory metadata, so if you have 10 byte 
> filenames, you should indeed expect the last directory metadata to be around 
> 20k*10 (maybe a bit less as it is compressed). Anyway, the Git format is not 
> very flexible (as we want to keep compatibility with the Git format) but that 
> useful to understand what is not optimal to improve it in the custom backend.
> Thomas

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.