[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Some thoughts on operating unikernel environments



On 25 August 2015 at 12:46, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> On 22 August 2015 at 18:37, Gareth Rushgrove <gareth@xxxxxxxxxxxxxxxxx> wrote:
>> On 22 August 2015 at 12:34, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>> On 21 August 2015 at 17:07, Gareth Rushgrove <gareth@xxxxxxxxxxxxxxxxx> 
>>> wrote:
>>>> I'd managed to get a bunch of throughts out of how head an into blog
>>>> post form, on the theme of operating unikernels.
>>>>
>>>> The general gist is, assuming unikernels are awesome, how do we build
>>>> and run production systems based on them?
>>>>
>>>> http://www.morethanseven.net/2015/08/21/operating-unikernel-challenges/
>>>>
>>>> This is mainly a list of problems, I'd love to heard from anyone who
>>>> has done any hard thinking on any of them or cut any tools in this
>>>> space.
>>>
>>> Hi Gareth,
>>>
>>> A few thoughts:
>>>
>>
>> Thanks for replying.
>>
>>>
>>> "How do I compose several unikernels together to build an application?"
>>>
>>> I think you answer this later, in the Orchestration section: the same
>>> way we do with other VMs/containers - using Docker Compose, Ubuntu
>>> Juju, etc. I haven't built anything big enough to need this yet
>>> though.
>>>
>>
>> That's my view as well (CloudFoundry or Kubernetes model would appear
>> to work?) but I've not seen anyone doing this yet. Which probably
>> means gaps exist when you actually try :) If anyone takes a run at
>> this I'd certainly be interested, I'm guessing Lattice
>> [http://lattice.cf/] might be a nice place to start?
>
> Me too. Since Mirage services can also be compiled as Unix binaries it
> should be possible to test deployment configuration using existing
> systems right now. Then, it's "just" a matter of teaching the
> deployment system to deploy unikernel VMs directly, rather than
> deploying Linux VMs containing the service.
>
>>> What does a Continuous integration or deployment pipeline look like?
>>>
>>> Amir gives an example in "Towards Heroku for Unikernels: Part 1 -
>>> Automated deployment":
>>>
>>> http://amirchaudhry.com/heroku-for-unikernels-pt1/
>>
>> While an example of what's possible I don't think this is the highly
>> opinionated high-level interface that would be required to make it
>> easy to get started. Githooks, Makefiles and shell scripts are great
>> for prototypes but don't tend to make for a great experience in my
>> view. The skeleton is great, but only covers running unit tests and
>> only on Travis. Test Kitchen [http://kitchen.ci/] is maybe a nice
>> model to look at - as a thought experiment "what would Test Kitchen
>> for Mirage look like?"
>
> Amir, any thoughts on this? I don't see any reason why the deployment
> scripts can't be made generic and packaged up.
>
>>> "By removing the operating system we remove things like host firewalls ..."
>>>
>>> I see two main uses for firewalls. One is to avoid accidentally
>>> exposing a host-only service (e.g. a database used by a web app in the
>>> same VM) and the other is to provide basic access contol between VMs
>>> (only the web VM can access the DB VM).
>>>
>>> For the first, two services in the same Mirage unikernel will
>>> communicate directly using OCaml datatypes. When everything is a
>>> library, using a network for internal communication would be crazy.
>>
>> At any degree of scale though you're going to be running many
>> unikernels across many hosts - so some degree of network communication
>> is going to be required (even if you minimise it with locality). Also,
>> in most environments some of that integration is going to be with
>> non-mirage/ocaml based systems and/or not running on the same
>> hosts/datacenters.
>>
>>> Also, while Linux allows any process to listen on the network, Mirage
>>> uses dependency injection so that only components that need network
>>> access will be given it.
>>>
>>
>> Yup, which is great. My thoughts were mainly about the second issue...
>>
>>> For the second, whatever is composing the services should configure
>>> the network, in my opinion. In other words, if I say I want my web
>>> server VM connected to a database VM, then nothing else should have
>>> access to the DB VM.
>>>
>>> I would certainly like to see a higher-level API for networking, that
>>> doesn't allow unexpected connections. e.g. we currently offer services
>>> a low-level network API like:
>>>
>>>   val connect : network -> ipaddr -> port -> flow
>>>   val listen : network -> port -> callback -> unit
>>>
>>> With this API, a library with network access can connect anywhere in
>>> the world by supplying any IP address and port number, and must handle
>>> its own encryption. A higher-level capability-style API could offer
>>> something more abstract, e.g.
>>>
>>>   module type SturdyRef = sig
>>>     type t
>>>     val connect : t -> flow
>>>   end
>>>
>>> Here, our web server would simply get a SturdyRef.t for the database,
>>> and all it could do would be to connect to it.
>>>
>>
>> Agreed. I just want something like this to exist :)
>>
>> I also think unikernels could make for really nice network devices
>> (firewalls, security controls, proxies, etc.)
>
> Yes. Here's a simple unikernel for a NAT device, for example:
>
>   https://github.com/yomimono/simple-nat
>
>> Lots of people are finding the network the limiting factor when they
>> start down a microservices rabbit hole in my experience. How would
>> unikernels work with some of the newer players in this space like
>> Weave [http://weave.works/] or Calico [http://www.projectcalico.org/]
>> might be interesting to consider?
>>
>>>
>>> What does debugging a system based on unikernels look like?
>>>
>>> There's an example here: https://mirage.io/wiki/profiling
>>>
>>> "As a motivating example, we'll track down a (real, but now fixed) bug
>>> in MirageOS's TCP stack."
>>>
>>
>> From an operators point of view that's not really the same thing. The
>> issues I see:
>>
>> * enabling it requires recompilation and redeployment (although you
>> could probably put this behind some sort of feature flag?)
>
> It can be enabled and disabled at run-time, but there's still a
> performance cost to having this kind of very detailed tracing
> available. I'd certainly like to see more support for general logging
> and metrics (the kind of thing you keep on all the time).
>

I'd love to see a port of the Java metrics library for this purpose.
It's been ported to other languages before and would be a great
starting point.

https://github.com/dropwizard/metrics

> On the other hand, I think you should be prepared to recompile and
> redeploy your unikernels when needed, and that shouldn't be a big
> deal. Trying to modify and redeploy a Linux kernel to get extra debug
> is a nightmare, but with a unikernel it can be very easy.
>
>> * it's not interactive
>
> You can refresh the view while its running, so if you have something
> you can tweak dynamically, you can see what effect it's having.
>
>
>> I think the first is interesting, as the unikernel you're running
>> might be provided by a third party vendor and you might not have the
>> source code/right to modify/recompile. Or changes might required a
>> lengthy change approval process.
>
> Yes, for binary-only releases you have to compile any needed debug
> code into it at all times (or provide a separate debug build).
>

I'd definitely spend a bit of time thinking about how a feature flag
system would be implemented with Mirage based systems, or provide
suitable arguments against/alternatives. I'd say it's a core part of
modern operations at this point.

>> The second might be a matter of debugging at the hypervisor/xen layer
>> but I've limited experience there. That also raises isolation issues -
>> I probably want to limit access to the hypervisor more than to an
>> individual application instance.
>>
>> I'm obviously mainly in critique mode with the post and points above.
>> My main interest is in getting anyone thinking about operational
>> problems early, in my view it's a pretty interesting set of issues for
>> which good solutions undoutedly exist.
>
> We need more experience reports here. In my case, all problems have been one 
> of:
>
> - Why did this take so long? (the disk driver didn't support large
> requests and had to split them; the TCP stack set the retransmission
> timeout too long)
>
> - Why did this fail? (some exception details got ignored and replaced
> by a generic error; I want to see the original)
>
> - Why didn't this ever finish? (the ARP reply arrived before we
> started waiting for it)
>
> These questions can all be answered with the existing tracing. What
> kind of interactive debugging would be helpful for you?
>
> When anyone has a hard-to-diagnose problem, I'm interested to see how
> Mirage's tracing or error reporting could be improved to make the
> problem obvious.
>
>
> --
> Dr Thomas Leonard        http://roscidus.com/blog/
> GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA



-- 
Gareth Rushgrove
@garethr

devopsweekly.com
morethanseven.net
garethrushgrove.com

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.