[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thoughts on cloud control APIs for Mirage



On 17 Oct 2011, at 13:41, Steven Hand wrote:

> Funnily enough, the *original* model for XenStore was distributed, and was 
> inspired by Plan 9 name spaces. 
> 
> Nowadays I'd recommend looking into Zookeeper for a more interesting kind of 
> coordination space...

Remind me a lot of Mach namespaces, needless to say! Some of my favourite 
NeXTStep/Mac OS X vulnerabilities come from a composition of the UNIX security 
model (grounded in the reliability of global namespaces) and Mach-based 
services such as NetInfo (now DirectoryServices). On the UNIX side, su(1) 
depends on safe access to the password file, but on the Mach side, the password 
file access service is via per-process Mach namespaces, which can be interposed 
on.

Robert

> 
> Cheers,
> 
> S.
> 
> 
> -----Original Message-----
> From: cl-mirage-bounces@xxxxxxxxxxxxxxx 
> [mailto:cl-mirage-bounces@xxxxxxxxxxxxxxx] On Behalf Of Anil Madhavapeddy
> Sent: Monday, October 17, 2011 1:04 PM
> To: Thomas Gazagnaire
> Cc: cl-mirage@xxxxxxxxxxxxxxx
> Subject: Re: Thoughts on cloud control APIs for Mirage
> 
> Right... Xenstore is 'almost there' but not quite.  For example, it has 
> transactions and a globally shared namespace, whereas the Plan 9 model is to 
> give processes their own namespace and mount other services into that.
> 
> So if you have an HTTP server domain, it might export a 
> /http/server/recoil.org directory, and clients wanting to read a URL can 
> import that filesystem somewhere into their system and read files under it.  
> A HTTP proxy could then serialise that file into actual HTTP and write it to 
> a /net/tcp/555/data to respond to an external request.  This could all happen 
> within the same kernel, or across multiple domains.
> 
> Xenstore will always require a globally privileged Xenstored to manage the 
> namespace, whereas the Plan 9 model is far better suited to multiple 
> intercommunicating processes (or stub domains).  I'm just thinking through 
> the implications on consistency models across a cluster of physical hosts at 
> the moment though...
> 
> Anil
> 
> On 17 Oct 2011, at 07:57, Thomas Gazagnaire wrote:
> 
>> Basically you say we need Xenstore :-)
>> 
>> putting the plan9 paper on my to-read list.
>> 
>> --
>> Thomas
>> 
>> On Oct 17, 2011, at 1:45 PM, Anil Madhavapeddy wrote:
>> 
>>> Mirage now has a number of protocols implemented as libraries, as
>>> well as device drivers. What's missing is an effective control stack to
>>> glue all this together into a proper OS.  So far, we are just wiring
>>> together applications manually from the libraries, which is fine for
>>> development but not for any real deployment.
>>> 
>>> I've been re-reading the Plan 9 papers [1] for inspiration, and many of
>>> the ideas there are highly applicable to us. To realise the Mirage goal of
>>> synthesising microkernels that are 'minimal for purpose', we need to:
>>> 
>>> - have multiple intercommunicating components, separated by process
>>> boundaries (on UNIX) or VM isolation (on Xen), or simply a function
>>> call compiled as part of the same kernel.
>>> 
>>> - minimise information flow between components, so they can be
>>> dynamically split up ('self scaling') or combined more easily.
>>> 
>>> - deal with the full lifecycle of all these VMs and processes, and not 
>>> just spawning them.
>>> 
>>> Plan 9 was built on very similar principles: instead of a big monolithic
>>> kernel, the system is built on many processes that communicate via a
>>> well-defined wire protocol (9P), and per-process namespaces and filesystem
>>> abstractions for almost every service.  For example, instead of 'ifconfig',
>>> the network is simply exposed as a /net filesystem and configured through
>>> filesystem calls rather than an alternative command line.  Crucially, the
>>> 9P protocol can be remotely called, or directly via a simple function call
>>> (for direct in-kernel operations).
>>> 
>>> In contrast, modern cloud stacks are just terribly designed: they consist
>>> of a huge amount of static specification of VM and network state, with
>>> little attention paid to simple UNIX/Plan9 principles that can be used to
>>> build the more complicated abstractions.
>>> 
>>> So, this leaves us with an interesting opportunity: to implement the
>>> Mirage control interface using similar principles:
>>> 
>>> - a per-deployment global hierarchial tree (i.e. a filesystem), with ways
>>> to synchronise on entries (i.e. blocking I/O, or a select/poll
>>> equivalent).  Our consistency model may vary somewhat, as we could be
>>> strongly consistent between VMs when running on the same physical host,
>>> and more loose cluster-wide.
>>> 
>>> - every library exposes a set of keys and values, as well as a mechanism
>>> for session setup, authentication and teardown (the lifecycle of the
>>> process. Plan 9 used ASCII for everything, whereas Mirage would layer
>>> a well-typed API on top of it (e.g. just write a record to a file rather
>>> than manually serialising it).
>>> 
>>> - extend the Xen Cloud Platform to support delegation, so that microVMs
>>> can be monitored or killed by supervisors. Unlike Plan9, this also
>>> includes operations across physical hosts (e.g. live relocation), or
>>> across cloud providers.
>>> 
>>> There are some nice implications of this work that goes beyond Mirage:
>>> 
>>> - it generally applies to all of the exokernel libraries out there,
>>> including HalVM (Haskell) or GuestVM (Java), as they all have this
>>> control problem that makes manpulating raw kernels such a pain to do.
>>> 
>>> - it can easily be extended to support existing applications on a
>>> monolithic guest kernel, and in make it easier to manage them too.
>>> 
>>> - application synthesis becomes much more viable: this approach could let
>>> me build a HTTP microkernel without a TCP stack, and simply receive a
>>> typed RPC from a HTTP proxy (which has done all the work of parsing the
>>> TCP and HTTP bits, so why repeat it?).  If my HTTP server microkernel
>>> later live migrates away, then it could swap back to a network connection.
>>> 
>>> Modern cloudy applications (especialy Hadoop or CIEL) use HTTP very
>>> heavily to talk between components, so optimising this part of the stack
>>> is worthwhile (numbers needed!)
>>> 
>>> - Even if components are compiled up in the same binary and use function
>>> calls, they still have to establish and authenticate connections to each
>>> others.  This makes monitoring and scaling hugely easier, since the 
>>> control filesystem operations provide a natural logging and introspection
>>> point, even for large clusters.  If we had a hardware-capability-aware
>>> CPU in the future, it could use this information too :-)
>>> 
>>> I highly recommend that anyone interested in this area read the Plan 9
>>> paper, as it's a really good read anyway [1]. Also the Scout OS and
>>> x-kernel stack are good.  Our main difference from this work is the
>>> heavy emphasis on type-safe components, as well as realistic deployment
>>> due to the use of Xen cloud providers as a stable hardware interface.
>>> 
>>> In the very short-term, Mort and I have an OpenFlow tutorial coming up in
>>> mid-November, so I'll lash up the network stack to have a manual version
>>> of this as soon as possible, so that you can configure all the tap
>>> interfaces and such much more quickly.  Meanwhile, all and any thoughts
>>> are most welcome!
>>> 
>>> [1] Plan 9 papers: http://cm.bell-labs.com/sys/doc/
>>> 
>>> -- 
>>> Anil Madhavapeddy                                 http://anil.recoil.org
>>> 
>> 
>> 
> 
> 
> 
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2012.0.1831 / Virus Database: 2090/4557 - Release Date: 10/17/11
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.