[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] deployment scripts: moving (e.g. mirage-www) away from crunch?

On 3 Aug 2015, at 15:26, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote:
> Hi,
> At the moment the mirage websites are deployed automatically roughly like 
> this:
> - developer makes a pull request against code repo (e.g. 
> https://github.com/mirage/mirage-www)
> - travis builds and performs sanity checks
> - reviewer reviews and merges the change
> - travis builds a single Xen unikernel image and checks it into a deployment 
> repo (e.g. https://github.com/mirage/mirage-www-deployment)
> - the host pulls from the deployment repo and restarts the VM
> The Xen unikernel is standalone: it contains all the code and data linked 
> together, consistent with the Mirage philosophy. However as the Mirage 
> websites gain new content, the amount of static data increases. Since this is 
> all âcrunchâed into the kernel binary it ends up being loaded into RAM and 
> sitting in the OCaml heap. Therefore the memory footprint of the unikernels 
> is slowly increasing over time. Itâs obviously a bit of a killer if you want 
> to serve something genuinely big (say a video) from a low-memory device (a 
> little cubieboard2 perhaps)
> I was wondering if we should move away from crunch, and use some other method 
> for static data. Mirage already supports static data
> - from Irmin
> - from BLOCK devices formatted with FAT
> - from BLOCK devices containing tar-format data (new in Mirage 2.6.0)
> I can think of 2 general approaches:
> 1. during the existing build process, build both a kernel and a second binary 
> blob containing data which will become a BLOCK device. The deployment scripts 
> would simply have to attach the BLOCK devices in the VM configuration.
> 2. check in the data files into a subdirectory in the deployment tree, and 
> make the deployment scripts perform the final conversion (to Irmin, FAT or 
> tar). This has the disadvantage that it leaves some of the final âlinkingâ to 
> the deployment scripts (which are currently outside the scope of the âmirageâ 
> tool) but it has the advantage that the individual data files should be 
> de-duped by git/Irmin, since their sha1 hashes should match. If this final 
> assembly stage gets more complicated, should the âmirageâ tool gain some 
> extra support for it (mirage configure; mirage build; â later on a different 
> host â; mirage deploy?)

I agree with Justin that 2 is better from a dedup perspective, and to maintain 
link-time flexibility.

One thought that occurs to me is that crunch would be far more efficient if it 
didn't link the data in twice.  Right now it stores the ML values as a string.  
I wonder if it would be better for them to be linked into a separate ELF 
section, and then exposed directly as zero-copy Cstructs from that area of 
memory that's already mapped in.

This would play well with the scheme for dynamic data as well -- a dynamic 
attach could do the equivalent of a Dynlink and make the same filesystem 
variables available.

> Thereâs also the issue of how best to handle secret volumes such as those 
> containing keys.

I think this definitely has to be handled in the deployment scripts and not the 
build time.

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.