[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thoughts on cloud control APIs for Mirage



Basically you say we need Xenstore :-)

putting the plan9 paper on my to-read list.

--
Thomas

On Oct 17, 2011, at 1:45 PM, Anil Madhavapeddy wrote:

> Mirage now has a number of protocols implemented as libraries, as
> well as device drivers. What's missing is an effective control stack to
> glue all this together into a proper OS.  So far, we are just wiring
> together applications manually from the libraries, which is fine for
> development but not for any real deployment.
> 
> I've been re-reading the Plan 9 papers [1] for inspiration, and many of
> the ideas there are highly applicable to us. To realise the Mirage goal of
> synthesising microkernels that are 'minimal for purpose', we need to:
> 
> - have multiple intercommunicating components, separated by process
>  boundaries (on UNIX) or VM isolation (on Xen), or simply a function
>  call compiled as part of the same kernel.
> 
> - minimise information flow between components, so they can be
>  dynamically split up ('self scaling') or combined more easily.
> 
> - deal with the full lifecycle of all these VMs and processes, and not 
>  just spawning them.
> 
> Plan 9 was built on very similar principles: instead of a big monolithic
> kernel, the system is built on many processes that communicate via a
> well-defined wire protocol (9P), and per-process namespaces and filesystem
> abstractions for almost every service.  For example, instead of 'ifconfig',
> the network is simply exposed as a /net filesystem and configured through
> filesystem calls rather than an alternative command line.  Crucially, the
> 9P protocol can be remotely called, or directly via a simple function call
> (for direct in-kernel operations).
> 
> In contrast, modern cloud stacks are just terribly designed: they consist
> of a huge amount of static specification of VM and network state, with
> little attention paid to simple UNIX/Plan9 principles that can be used to
> build the more complicated abstractions.
> 
> So, this leaves us with an interesting opportunity: to implement the
> Mirage control interface using similar principles:
> 
> - a per-deployment global hierarchial tree (i.e. a filesystem), with ways
>  to synchronise on entries (i.e. blocking I/O, or a select/poll
>  equivalent).  Our consistency model may vary somewhat, as we could be
>  strongly consistent between VMs when running on the same physical host,
>  and more loose cluster-wide.
> 
> - every library exposes a set of keys and values, as well as a mechanism
>  for session setup, authentication and teardown (the lifecycle of the
>  process. Plan 9 used ASCII for everything, whereas Mirage would layer
>  a well-typed API on top of it (e.g. just write a record to a file rather
>  than manually serialising it).
> 
> - extend the Xen Cloud Platform to support delegation, so that microVMs
>  can be monitored or killed by supervisors. Unlike Plan9, this also
>  includes operations across physical hosts (e.g. live relocation), or
>  across cloud providers.
> 
> There are some nice implications of this work that goes beyond Mirage:
> 
> - it generally applies to all of the exokernel libraries out there,
>  including HalVM (Haskell) or GuestVM (Java), as they all have this
>  control problem that makes manpulating raw kernels such a pain to do.
> 
> - it can easily be extended to support existing applications on a
>  monolithic guest kernel, and in make it easier to manage them too.
> 
> - application synthesis becomes much more viable: this approach could let
>  me build a HTTP microkernel without a TCP stack, and simply receive a
>  typed RPC from a HTTP proxy (which has done all the work of parsing the
>  TCP and HTTP bits, so why repeat it?).  If my HTTP server microkernel
>  later live migrates away, then it could swap back to a network connection.
> 
>  Modern cloudy applications (especialy Hadoop or CIEL) use HTTP very
>  heavily to talk between components, so optimising this part of the stack
>  is worthwhile (numbers needed!)
> 
> - Even if components are compiled up in the same binary and use function
>  calls, they still have to establish and authenticate connections to each
>  others.  This makes monitoring and scaling hugely easier, since the 
>  control filesystem operations provide a natural logging and introspection
>  point, even for large clusters.  If we had a hardware-capability-aware
>  CPU in the future, it could use this information too :-)
> 
> I highly recommend that anyone interested in this area read the Plan 9
> paper, as it's a really good read anyway [1]. Also the Scout OS and
> x-kernel stack are good.  Our main difference from this work is the
> heavy emphasis on type-safe components, as well as realistic deployment
> due to the use of Xen cloud providers as a stable hardware interface.
> 
> In the very short-term, Mort and I have an OpenFlow tutorial coming up in
> mid-November, so I'll lash up the network stack to have a manual version
> of this as soon as possible, so that you can configure all the tap
> interfaces and such much more quickly.  Meanwhile, all and any thoughts
> are most welcome!
> 
> [1] Plan 9 papers: http://cm.bell-labs.com/sys/doc/
> 
> -- 
> Anil Madhavapeddy                                 http://anil.recoil.org
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.