Xen project Mailing List

Thoughts on cloud control APIs for Mirage

From: Anil Madhavapeddy <anil@xxxxxxxxxx>

Date: Mon, 17 Oct 2011 12:45:31 +0100

List-id: MirageOS development <cl-mirage.lists.cam.ac.uk>

Mirage now has a number of protocols implemented as libraries, as well as device drivers. What's missing is an effective control stack to glue all this together into a proper OS. So far, we are just wiring together applications manually from the libraries, which is fine for development but not for any real deployment. I've been re-reading the Plan 9 papers [1] for inspiration, and many of the ideas there are highly applicable to us. To realise the Mirage goal of synthesising microkernels that are 'minimal for purpose', we need to: - have multiple intercommunicating components, separated by process boundaries (on UNIX) or VM isolation (on Xen), or simply a function call compiled as part of the same kernel. - minimise information flow between components, so they can be dynamically split up ('self scaling') or combined more easily. - deal with the full lifecycle of all these VMs and processes, and not just spawning them. Plan 9 was built on very similar principles: instead of a big monolithic kernel, the system is built on many processes that communicate via a well-defined wire protocol (9P), and per-process namespaces and filesystem abstractions for almost every service. For example, instead of 'ifconfig', the network is simply exposed as a /net filesystem and configured through filesystem calls rather than an alternative command line. Crucially, the 9P protocol can be remotely called, or directly via a simple function call (for direct in-kernel operations). In contrast, modern cloud stacks are just terribly designed: they consist of a huge amount of static specification of VM and network state, with little attention paid to simple UNIX/Plan9 principles that can be used to build the more complicated abstractions. So, this leaves us with an interesting opportunity: to implement the Mirage control interface using similar principles: - a per-deployment global hierarchial tree (i.e. a filesystem), with ways to synchronise on entries (i.e. blocking I/O, or a select/poll equivalent). Our consistency model may vary somewhat, as we could be strongly consistent between VMs when running on the same physical host, and more loose cluster-wide. - every library exposes a set of keys and values, as well as a mechanism for session setup, authentication and teardown (the lifecycle of the process. Plan 9 used ASCII for everything, whereas Mirage would layer a well-typed API on top of it (e.g. just write a record to a file rather than manually serialising it). - extend the Xen Cloud Platform to support delegation, so that microVMs can be monitored or killed by supervisors. Unlike Plan9, this also includes operations across physical hosts (e.g. live relocation), or across cloud providers. There are some nice implications of this work that goes beyond Mirage: - it generally applies to all of the exokernel libraries out there, including HalVM (Haskell) or GuestVM (Java), as they all have this control problem that makes manpulating raw kernels such a pain to do. - it can easily be extended to support existing applications on a monolithic guest kernel, and in make it easier to manage them too. - application synthesis becomes much more viable: this approach could let me build a HTTP microkernel without a TCP stack, and simply receive a typed RPC from a HTTP proxy (which has done all the work of parsing the TCP and HTTP bits, so why repeat it?). If my HTTP server microkernel later live migrates away, then it could swap back to a network connection. Modern cloudy applications (especialy Hadoop or CIEL) use HTTP very heavily to talk between components, so optimising this part of the stack is worthwhile (numbers needed!) - Even if components are compiled up in the same binary and use function calls, they still have to establish and authenticate connections to each others. This makes monitoring and scaling hugely easier, since the control filesystem operations provide a natural logging and introspection point, even for large clusters. If we had a hardware-capability-aware CPU in the future, it could use this information too :-) I highly recommend that anyone interested in this area read the Plan 9 paper, as it's a really good read anyway [1]. Also the Scout OS and x-kernel stack are good. Our main difference from this work is the heavy emphasis on type-safe components, as well as realistic deployment due to the use of Xen cloud providers as a stable hardware interface. In the very short-term, Mort and I have an OpenFlow tutorial coming up in mid-November, so I'll lash up the network stack to have a manual version of this as soon as possible, so that you can configure all the tap interfaces and such much more quickly. Meanwhile, all and any thoughts are most welcome! [1] Plan 9 papers: http://cm.bell-labs.com/sys/doc/ -- Anil Madhavapeddy http://anil.recoil.org

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.