[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] irmin storage overhead and dedup

>> I looked at the metadata that gets created for every email message and itâs 
>> small - less than 100 bytes. So I ran a simple test of appending 20,000 
>> unique 100 bytes ascii messages. I would have expected the repository size 
>> to be on the order of a few megabytes, instead it was 4.7G. This is roughly 
>> 234K overhead per 100 bytes message, which would be quite impractical for 
>> the email storage with the metadata essentially exceeding the message 
>> storage.
> Did you start from an empty repository? Would be interested to run your code 
> locally to check what happens. 

Did you add the message sequentially or did you use a view? If you have 20k 
commits (which are at least 4k), but you also have 20k different directories 
(which contain 1, 2, 3, ... 20k files). What's the size of your filenames. They 
are serialized in the directory metadata, so if you have 10 byte filenames, you 
should indeed expect the last directory metadata to be around 20k*10 (maybe a 
bit less as it is compressed). Anyway, the Git format is not very flexible (as 
we want to keep compatibility with the Git format) but that useful to 
understand what is not optimal to improve it in the custom backend.

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.