[Xen-devel] [RFC PATCH 0/7] COarse-grain LOck-stepping Virtual Machines for Non-stop Service

Virtual machine (VM) replication is a well known technique for providing
application-agnostic software-implemented hardware fault tolerance -
"non-stop service". Currently, remus provides this function, but it buffers
all output packets, and the latency is unacceptable.

In xen summit 2012, We introduce a new VM replication solution: colo
(COarse-grain LOck-stepping virtual machine). The presentation is in
the following URL:

Here is the summary of the solution:
>From the client's point of view, as long as the client observes identical
responses from the primary and secondary VMs, according to the service
semantics, then the secondary VM(SVM) is a valid replica of the primary
VM(PVM), and can successfully take over when a hardware failure of the
PVM is detected.

This patchset is RFC, and implements the frame of colo:
1. Both PVM and SVM are running
2. Forward the input packets from client to secondary machine(slaver)
3. Forward the output packets from SVM to primary machine(master)
4. Compare the output packets from PVM and SVM on the master side. If the
   output packets are different, do a checkpoint

  Patch 1: optimize the dirty pages transfer speed.
  Patch 2-3: allow SVM running after checkpoint
  Patch 4-5: modification for colo on the master side(wait a new checkpoint,
             communicate with slaver when doing checkoint)
  Patch 6-7: implement colo's user interface

Wen Congyang (7):
  xc_domain_save: cache pages mapping
  xc_domain_restore: introduce restore_callbacks for colo
  colo: implement restore_callbacks
  xc_domain_save: flush cache before calling callbacks->postcopy()
  xc_domain_save: implement save_callbacks for colo
  XendCheckpoint: implement colo
  remus: implement colo mode

 tools/libxc/Makefile                              |   4 +-
 tools/libxc/ia64/xc_ia64_linux_restore.c          |   3 +-
 tools/libxc/xc_domain_restore.c                   | 256 +++++---
 tools/libxc/xc_domain_restore_colo.c              | 740 ++++++++++++++++++++++
 tools/libxc/xc_domain_save.c                      | 162 +++--
 tools/libxc/xc_save_restore_colo.h                |  44 ++
 tools/libxc/xenguest.h                            |  57 +-
 tools/libxl/libxl_dom.c                           |   2 +-
 tools/python/xen/lowlevel/checkpoint/checkpoint.c | 289 ++++++++-
 tools/python/xen/lowlevel/checkpoint/checkpoint.h |   2 +
 tools/python/xen/remus/image.py                   |   7 +-
 tools/python/xen/remus/save.py                    |   6 +-
 tools/python/xen/xend/XendCheckpoint.py           | 138 ++--
 tools/remus/remus                                 |   8 +-
 tools/xcutils/xc_restore.c                        |   3 +-
 xen/include/public/xen.h                          |   1 +
 16 files changed, 1503 insertions(+), 219 deletions(-)
 create mode 100644 tools/libxc/xc_domain_restore_colo.c
 create mode 100644 tools/libxc/xc_save_restore_colo.h


