[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MirageOS-devel] irmin storage overhead and dedup

Hi Thomas,

Iâm trying to figure out what kind of storage overhead and dedup I get in 
Irmin. First I tried to convert the google email archive (2.4G) to the IMAP 
server Irmin format . After conversion the size of the git repository was twice 
the size of the original archive. I do have some additional structures that I 
create, like per mailbox index and summary statistics and per email message 
flags so perhaps the extra size is coming from those structures though it seems 
a bit high. I will have to estimate the expected size from additional 
structures to understand this result. Next I dumped into irmin 2,000 of 1M 
files with random ascii content which resulted in the git repository size of 
950M. I figure Irmin compresses the content, right? To verify this I dumped 
2,000 of 2.4M image files with concatenated counter to make the content unique. 
The size of repository for this was 4.6G, which is expected. Then I repeated 
the last test but with identical images and this time the size was 27M, which 
was clearly a nice proof of the deduping by Irmin. My question is whether the 
compression in Irmin is configurable? Can it be configurable per individual 
content? For instance, I donât want to compress images as there is nothing to 
gain from the space saving and consequently there is unnecessary resource usage 
but I do want to compress the text if the compression overhead is reasonable. I 
can figure out the type of content from MIME type in IMAP server.

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.