|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Thoughts on cloud control APIs for Mirage
Mirage now has a number of protocols implemented as libraries, as
well as device drivers. What's missing is an effective control stack to
glue all this together into a proper OS. So far, we are just wiring
together applications manually from the libraries, which is fine for
development but not for any real deployment.
I've been re-reading the Plan 9 papers [1] for inspiration, and many of
the ideas there are highly applicable to us. To realise the Mirage goal of
synthesising microkernels that are 'minimal for purpose', we need to:
- have multiple intercommunicating components, separated by process
boundaries (on UNIX) or VM isolation (on Xen), or simply a function
call compiled as part of the same kernel.
- minimise information flow between components, so they can be
dynamically split up ('self scaling') or combined more easily.
- deal with the full lifecycle of all these VMs and processes, and not
just spawning them.
Plan 9 was built on very similar principles: instead of a big monolithic
kernel, the system is built on many processes that communicate via a
well-defined wire protocol (9P), and per-process namespaces and filesystem
abstractions for almost every service. For example, instead of 'ifconfig',
the network is simply exposed as a /net filesystem and configured through
filesystem calls rather than an alternative command line. Crucially, the
9P protocol can be remotely called, or directly via a simple function call
(for direct in-kernel operations).
In contrast, modern cloud stacks are just terribly designed: they consist
of a huge amount of static specification of VM and network state, with
little attention paid to simple UNIX/Plan9 principles that can be used to
build the more complicated abstractions.
So, this leaves us with an interesting opportunity: to implement the
Mirage control interface using similar principles:
- a per-deployment global hierarchial tree (i.e. a filesystem), with ways
to synchronise on entries (i.e. blocking I/O, or a select/poll
equivalent). Our consistency model may vary somewhat, as we could be
strongly consistent between VMs when running on the same physical host,
and more loose cluster-wide.
- every library exposes a set of keys and values, as well as a mechanism
for session setup, authentication and teardown (the lifecycle of the
process. Plan 9 used ASCII for everything, whereas Mirage would layer
a well-typed API on top of it (e.g. just write a record to a file rather
than manually serialising it).
- extend the Xen Cloud Platform to support delegation, so that microVMs
can be monitored or killed by supervisors. Unlike Plan9, this also
includes operations across physical hosts (e.g. live relocation), or
across cloud providers.
There are some nice implications of this work that goes beyond Mirage:
- it generally applies to all of the exokernel libraries out there,
including HalVM (Haskell) or GuestVM (Java), as they all have this
control problem that makes manpulating raw kernels such a pain to do.
- it can easily be extended to support existing applications on a
monolithic guest kernel, and in make it easier to manage them too.
- application synthesis becomes much more viable: this approach could let
me build a HTTP microkernel without a TCP stack, and simply receive a
typed RPC from a HTTP proxy (which has done all the work of parsing the
TCP and HTTP bits, so why repeat it?). If my HTTP server microkernel
later live migrates away, then it could swap back to a network connection.
Modern cloudy applications (especialy Hadoop or CIEL) use HTTP very
heavily to talk between components, so optimising this part of the stack
is worthwhile (numbers needed!)
- Even if components are compiled up in the same binary and use function
calls, they still have to establish and authenticate connections to each
others. This makes monitoring and scaling hugely easier, since the
control filesystem operations provide a natural logging and introspection
point, even for large clusters. If we had a hardware-capability-aware
CPU in the future, it could use this information too :-)
I highly recommend that anyone interested in this area read the Plan 9
paper, as it's a really good read anyway [1]. Also the Scout OS and
x-kernel stack are good. Our main difference from this work is the
heavy emphasis on type-safe components, as well as realistic deployment
due to the use of Xen cloud providers as a stable hardware interface.
In the very short-term, Mort and I have an OpenFlow tutorial coming up in
mid-November, so I'll lash up the network stack to have a manual version
of this as soon as possible, so that you can configure all the tap
interfaces and such much more quickly. Meanwhile, all and any thoughts
are most welcome!
[1] Plan 9 papers: http://cm.bell-labs.com/sys/doc/
--
Anil Madhavapeddy http://anil.recoil.org
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |