[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] irmin storage overhead and dedup



Hi Thomas,

I looked at the metadata that gets created for every email message and itâs 
small - less than 100 bytes. So I ran a simple test of appending 20,000 unique 
100 bytes ascii messages. I would have expected the repository size to be on 
the order of a few megabytes, instead it was 4.7G. This is roughly 234K 
overhead per 100 bytes message, which would be quite impractical for the email 
storage with the metadata essentially exceeding the message storage.

Gregory

> On Dec 30, 2014, at 7:07 PM, Gregory Tsipenyuk <gt303@xxxxxxxxx> wrote:
> 
> Hi Thomas,
> 
> Iâm trying to figure out what kind of storage overhead and dedup I get in 
> Irmin. First I tried to convert the google email archive (2.4G) to the IMAP 
> server Irmin format . After conversion the size of the git repository was 
> twice the size of the original archive. I do have some additional structures 
> that I create, like per mailbox index and summary statistics and per email 
> message flags so perhaps the extra size is coming from those structures 
> though it seems a bit high. I will have to estimate the expected size from 
> additional structures to understand this result. Next I dumped into irmin 
> 2,000 of 1M files with random ascii content which resulted in the git 
> repository size of 950M. I figure Irmin compresses the content, right? To 
> verify this I dumped 2,000 of 2.4M image files with concatenated counter to 
> make the content unique. The size of repository for this was 4.6G, which is 
> expected. Then I repeated the last test but with identical images and this 
> time the size was 27M, which was clearly a nice proof of the deduping by 
> Irmin. My question is whether the compression in Irmin is configurable? Can 
> it be configurable per individual content? For instance, I donât want to 
> compress images as there is nothing to gain from the space saving and 
> consequently there is unnecessary resource usage but I do want to compress 
> the text if the compression overhead is reasonable. I can figure out the type 
> of content from MIME type in IMAP server.
> 
> Thanks 
> Gregory


_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.