Xen project Mailing List

Re: [MirageOS-devel] Some thoughts on operating unikernel environments

From: Gareth Rushgrove <gareth@xxxxxxxxxxxxxxxxx>

Date: Sun, 30 Aug 2015 18:35:55 +0100

Cc: mirageos-devel <MirageOS-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Sun, 30 Aug 2015 17:36:13 +0000

List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

On 25 August 2015 at 12:46, Thomas Leonard <talex5@xxxxxxxxx> wrote: > On 22 August 2015 at 18:37, Gareth Rushgrove <gareth@xxxxxxxxxxxxxxxxx> wrote: >> On 22 August 2015 at 12:34, Thomas Leonard <talex5@xxxxxxxxx> wrote: >>> On 21 August 2015 at 17:07, Gareth Rushgrove <gareth@xxxxxxxxxxxxxxxxx> >>> wrote: >>>> I'd managed to get a bunch of throughts out of how head an into blog >>>> post form, on the theme of operating unikernels. >>>> >>>> The general gist is, assuming unikernels are awesome, how do we build >>>> and run production systems based on them? >>>> >>>> http://www.morethanseven.net/2015/08/21/operating-unikernel-challenges/ >>>> >>>> This is mainly a list of problems, I'd love to heard from anyone who >>>> has done any hard thinking on any of them or cut any tools in this >>>> space. >>> >>> Hi Gareth, >>> >>> A few thoughts: >>> >> >> Thanks for replying. >> >>> >>> "How do I compose several unikernels together to build an application?" >>> >>> I think you answer this later, in the Orchestration section: the same >>> way we do with other VMs/containers - using Docker Compose, Ubuntu >>> Juju, etc. I haven't built anything big enough to need this yet >>> though. >>> >> >> That's my view as well (CloudFoundry or Kubernetes model would appear >> to work?) but I've not seen anyone doing this yet. Which probably >> means gaps exist when you actually try :) If anyone takes a run at >> this I'd certainly be interested, I'm guessing Lattice >> [http://lattice.cf/] might be a nice place to start? > > Me too. Since Mirage services can also be compiled as Unix binaries it > should be possible to test deployment configuration using existing > systems right now. Then, it's "just" a matter of teaching the > deployment system to deploy unikernel VMs directly, rather than > deploying Linux VMs containing the service. > >>> What does a Continuous integration or deployment pipeline look like? >>> >>> Amir gives an example in "Towards Heroku for Unikernels: Part 1 - >>> Automated deployment": >>> >>> http://amirchaudhry.com/heroku-for-unikernels-pt1/ >> >> While an example of what's possible I don't think this is the highly >> opinionated high-level interface that would be required to make it >> easy to get started. Githooks, Makefiles and shell scripts are great >> for prototypes but don't tend to make for a great experience in my >> view. The skeleton is great, but only covers running unit tests and >> only on Travis. Test Kitchen [http://kitchen.ci/] is maybe a nice >> model to look at - as a thought experiment "what would Test Kitchen >> for Mirage look like?" > > Amir, any thoughts on this? I don't see any reason why the deployment > scripts can't be made generic and packaged up. > >>> "By removing the operating system we remove things like host firewalls ..." >>> >>> I see two main uses for firewalls. One is to avoid accidentally >>> exposing a host-only service (e.g. a database used by a web app in the >>> same VM) and the other is to provide basic access contol between VMs >>> (only the web VM can access the DB VM). >>> >>> For the first, two services in the same Mirage unikernel will >>> communicate directly using OCaml datatypes. When everything is a >>> library, using a network for internal communication would be crazy. >> >> At any degree of scale though you're going to be running many >> unikernels across many hosts - so some degree of network communication >> is going to be required (even if you minimise it with locality). Also, >> in most environments some of that integration is going to be with >> non-mirage/ocaml based systems and/or not running on the same >> hosts/datacenters. >> >>> Also, while Linux allows any process to listen on the network, Mirage >>> uses dependency injection so that only components that need network >>> access will be given it. >>> >> >> Yup, which is great. My thoughts were mainly about the second issue... >> >>> For the second, whatever is composing the services should configure >>> the network, in my opinion. In other words, if I say I want my web >>> server VM connected to a database VM, then nothing else should have >>> access to the DB VM. >>> >>> I would certainly like to see a higher-level API for networking, that >>> doesn't allow unexpected connections. e.g. we currently offer services >>> a low-level network API like: >>> >>> val connect : network -> ipaddr -> port -> flow >>> val listen : network -> port -> callback -> unit >>> >>> With this API, a library with network access can connect anywhere in >>> the world by supplying any IP address and port number, and must handle >>> its own encryption. A higher-level capability-style API could offer >>> something more abstract, e.g. >>> >>> module type SturdyRef = sig >>> type t >>> val connect : t -> flow >>> end >>> >>> Here, our web server would simply get a SturdyRef.t for the database, >>> and all it could do would be to connect to it. >>> >> >> Agreed. I just want something like this to exist :) >> >> I also think unikernels could make for really nice network devices >> (firewalls, security controls, proxies, etc.) > > Yes. Here's a simple unikernel for a NAT device, for example: > > https://github.com/yomimono/simple-nat > >> Lots of people are finding the network the limiting factor when they >> start down a microservices rabbit hole in my experience. How would >> unikernels work with some of the newer players in this space like >> Weave [http://weave.works/] or Calico [http://www.projectcalico.org/] >> might be interesting to consider? >> >>> >>> What does debugging a system based on unikernels look like? >>> >>> There's an example here: https://mirage.io/wiki/profiling >>> >>> "As a motivating example, we'll track down a (real, but now fixed) bug >>> in MirageOS's TCP stack." >>> >> >> From an operators point of view that's not really the same thing. The >> issues I see: >> >> * enabling it requires recompilation and redeployment (although you >> could probably put this behind some sort of feature flag?) > > It can be enabled and disabled at run-time, but there's still a > performance cost to having this kind of very detailed tracing > available. I'd certainly like to see more support for general logging > and metrics (the kind of thing you keep on all the time). > I'd love to see a port of the Java metrics library for this purpose. It's been ported to other languages before and would be a great starting point. https://github.com/dropwizard/metrics > On the other hand, I think you should be prepared to recompile and > redeploy your unikernels when needed, and that shouldn't be a big > deal. Trying to modify and redeploy a Linux kernel to get extra debug > is a nightmare, but with a unikernel it can be very easy. > >> * it's not interactive > > You can refresh the view while its running, so if you have something > you can tweak dynamically, you can see what effect it's having. > > >> I think the first is interesting, as the unikernel you're running >> might be provided by a third party vendor and you might not have the >> source code/right to modify/recompile. Or changes might required a >> lengthy change approval process. > > Yes, for binary-only releases you have to compile any needed debug > code into it at all times (or provide a separate debug build). > I'd definitely spend a bit of time thinking about how a feature flag system would be implemented with Mirage based systems, or provide suitable arguments against/alternatives. I'd say it's a core part of modern operations at this point. >> The second might be a matter of debugging at the hypervisor/xen layer >> but I've limited experience there. That also raises isolation issues - >> I probably want to limit access to the hypervisor more than to an >> individual application instance. >> >> I'm obviously mainly in critique mode with the post and points above. >> My main interest is in getting anyone thinking about operational >> problems early, in my view it's a pretty interesting set of issues for >> which good solutions undoutedly exist. > > We need more experience reports here. In my case, all problems have been one > of: > > - Why did this take so long? (the disk driver didn't support large > requests and had to split them; the TCP stack set the retransmission > timeout too long) > > - Why did this fail? (some exception details got ignored and replaced > by a generic error; I want to see the original) > > - Why didn't this ever finish? (the ARP reply arrived before we > started waiting for it) > > These questions can all be answered with the existing tracing. What > kind of interactive debugging would be helpful for you? > > When anyone has a hard-to-diagnose problem, I'm interested to see how > Mirage's tracing or error reporting could be improved to make the > problem obvious. > > > -- > Dr Thomas Leonard http://roscidus.com/blog/ > GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA -- Gareth Rushgrove @garethr devopsweekly.com morethanseven.net garethrushgrove.com _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.