[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thoughts on cloud control APIs for Mirage



I've got the 'Life and times of a Zookeeper' paper on my iPad for reading next, 
in fact! :-)

In Plan 9, the data for streams were also handled through files as well as the 
coordination and control, whereas Zookeeper is very much a centralised name 
service that runs distributed.

I wonder if there may be a useful middle ground between the name service 
(distributed with a central controller and leadership) and the local filesystem 
views themselves (which seem more convenient for a library to manipulate, and 
could be hardwired easily if the system is not distributed).

-anil

On 17 Oct 2011, at 08:41, Steven Hand wrote:

> Funnily enough, the *original* model for XenStore was distributed, and was 
> inspired by Plan 9 name spaces. 
> 
> Nowadays I'd recommend looking into Zookeeper for a more interesting kind of 
> coordination space...
> 
> Cheers,
> 
> S.
> 
> 
> -----Original Message-----
> From: cl-mirage-bounces@xxxxxxxxxxxxxxx 
> [mailto:cl-mirage-bounces@xxxxxxxxxxxxxxx] On Behalf Of Anil Madhavapeddy
> Sent: Monday, October 17, 2011 1:04 PM
> To: Thomas Gazagnaire
> Cc: cl-mirage@xxxxxxxxxxxxxxx
> Subject: Re: Thoughts on cloud control APIs for Mirage
> 
> Right... Xenstore is 'almost there' but not quite.  For example, it has 
> transactions and a globally shared namespace, whereas the Plan 9 model is to 
> give processes their own namespace and mount other services into that.
> 
> So if you have an HTTP server domain, it might export a 
> /http/server/recoil.org directory, and clients wanting to read a URL can 
> import that filesystem somewhere into their system and read files under it.  
> A HTTP proxy could then serialise that file into actual HTTP and write it to 
> a /net/tcp/555/data to respond to an external request.  This could all happen 
> within the same kernel, or across multiple domains.
> 
> Xenstore will always require a globally privileged Xenstored to manage the 
> namespace, whereas the Plan 9 model is far better suited to multiple 
> intercommunicating processes (or stub domains).  I'm just thinking through 
> the implications on consistency models across a cluster of physical hosts at 
> the moment though...
> 
> Anil
> 
> On 17 Oct 2011, at 07:57, Thomas Gazagnaire wrote:
> 
>> Basically you say we need Xenstore :-)
>> 
>> putting the plan9 paper on my to-read list.
>> 
>> --
>> Thomas
>> 
>> On Oct 17, 2011, at 1:45 PM, Anil Madhavapeddy wrote:
>> 
>>> Mirage now has a number of protocols implemented as libraries, as
>>> well as device drivers. What's missing is an effective control stack to
>>> glue all this together into a proper OS.  So far, we are just wiring
>>> together applications manually from the libraries, which is fine for
>>> development but not for any real deployment.
>>> 
>>> I've been re-reading the Plan 9 papers [1] for inspiration, and many of
>>> the ideas there are highly applicable to us. To realise the Mirage goal of
>>> synthesising microkernels that are 'minimal for purpose', we need to:
>>> 
>>> - have multiple intercommunicating components, separated by process
>>> boundaries (on UNIX) or VM isolation (on Xen), or simply a function
>>> call compiled as part of the same kernel.
>>> 
>>> - minimise information flow between components, so they can be
>>> dynamically split up ('self scaling') or combined more easily.
>>> 
>>> - deal with the full lifecycle of all these VMs and processes, and not 
>>> just spawning them.
>>> 
>>> Plan 9 was built on very similar principles: instead of a big monolithic
>>> kernel, the system is built on many processes that communicate via a
>>> well-defined wire protocol (9P), and per-process namespaces and filesystem
>>> abstractions for almost every service.  For example, instead of 'ifconfig',
>>> the network is simply exposed as a /net filesystem and configured through
>>> filesystem calls rather than an alternative command line.  Crucially, the
>>> 9P protocol can be remotely called, or directly via a simple function call
>>> (for direct in-kernel operations).
>>> 
>>> In contrast, modern cloud stacks are just terribly designed: they consist
>>> of a huge amount of static specification of VM and network state, with
>>> little attention paid to simple UNIX/Plan9 principles that can be used to
>>> build the more complicated abstractions.
>>> 
>>> So, this leaves us with an interesting opportunity: to implement the
>>> Mirage control interface using similar principles:
>>> 
>>> - a per-deployment global hierarchial tree (i.e. a filesystem), with ways
>>> to synchronise on entries (i.e. blocking I/O, or a select/poll
>>> equivalent).  Our consistency model may vary somewhat, as we could be
>>> strongly consistent between VMs when running on the same physical host,
>>> and more loose cluster-wide.
>>> 
>>> - every library exposes a set of keys and values, as well as a mechanism
>>> for session setup, authentication and teardown (the lifecycle of the
>>> process. Plan 9 used ASCII for everything, whereas Mirage would layer
>>> a well-typed API on top of it (e.g. just write a record to a file rather
>>> than manually serialising it).
>>> 
>>> - extend the Xen Cloud Platform to support delegation, so that microVMs
>>> can be monitored or killed by supervisors. Unlike Plan9, this also
>>> includes operations across physical hosts (e.g. live relocation), or
>>> across cloud providers.
>>> 
>>> There are some nice implications of this work that goes beyond Mirage:
>>> 
>>> - it generally applies to all of the exokernel libraries out there,
>>> including HalVM (Haskell) or GuestVM (Java), as they all have this
>>> control problem that makes manpulating raw kernels such a pain to do.
>>> 
>>> - it can easily be extended to support existing applications on a
>>> monolithic guest kernel, and in make it easier to manage them too.
>>> 
>>> - application synthesis becomes much more viable: this approach could let
>>> me build a HTTP microkernel without a TCP stack, and simply receive a
>>> typed RPC from a HTTP proxy (which has done all the work of parsing the
>>> TCP and HTTP bits, so why repeat it?).  If my HTTP server microkernel
>>> later live migrates away, then it could swap back to a network connection.
>>> 
>>> Modern cloudy applications (especialy Hadoop or CIEL) use HTTP very
>>> heavily to talk between components, so optimising this part of the stack
>>> is worthwhile (numbers needed!)
>>> 
>>> - Even if components are compiled up in the same binary and use function
>>> calls, they still have to establish and authenticate connections to each
>>> others.  This makes monitoring and scaling hugely easier, since the 
>>> control filesystem operations provide a natural logging and introspection
>>> point, even for large clusters.  If we had a hardware-capability-aware
>>> CPU in the future, it could use this information too :-)
>>> 
>>> I highly recommend that anyone interested in this area read the Plan 9
>>> paper, as it's a really good read anyway [1]. Also the Scout OS and
>>> x-kernel stack are good.  Our main difference from this work is the
>>> heavy emphasis on type-safe components, as well as realistic deployment
>>> due to the use of Xen cloud providers as a stable hardware interface.
>>> 
>>> In the very short-term, Mort and I have an OpenFlow tutorial coming up in
>>> mid-November, so I'll lash up the network stack to have a manual version
>>> of this as soon as possible, so that you can configure all the tap
>>> interfaces and such much more quickly.  Meanwhile, all and any thoughts
>>> are most welcome!
>>> 
>>> [1] Plan 9 papers: http://cm.bell-labs.com/sys/doc/
>>> 
>>> -- 
>>> Anil Madhavapeddy                                 http://anil.recoil.org
>>> 
>> 
>> 
> 
> 
> 
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2012.0.1831 / Virus Database: 2090/4557 - Release Date: 10/17/11
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.