[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH COLO v5 01/29] Add readme



Wei,

  Thanks for the review and sorry for the late reply, was debugging some triple
fault bug. However, it is now fixed, and COLO running much more stable now.

  For the readme, it is kind of outdated, we will update it and also address
your comments, then we'll put it onto wiki page. The intree readme will be some
simple desciption and links to wiki pages.

On 04/09/2015 02:11 AM, Wei Liu wrote:
On Wed, Apr 01, 2015 at 02:41:37PM +0800, Yang Hongyang wrote:
From: Wen Congyang <wency@xxxxxxxxxxxxxx>

Signed-off-by: Wen Congyang <wency@xxxxxxxxxxxxxx>
Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
---
  docs/README.colo | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 92 insertions(+)
  create mode 100644 docs/README.colo

diff --git a/docs/README.colo b/docs/README.colo
new file mode 100644
index 0000000..60f487d
--- /dev/null
+++ b/docs/README.colo
@@ -0,0 +1,92 @@
+COLO provides fault tolerance for virtual machines by sending continuous
+checkpoints to a backup, which will activate if the target VM fails. It
+only supports HVM guest(without pv extensions).
                                    ^ PV

+
+Requriements:
+1. Hardware requriements
+   There is at least one directly connected nic to forward the nic from client
+   to secondary vm. The directly connected nic must not be used by any other
+   purpose. If your guest has more than one nic, you should have directly
+   connected nic for each guest nic. If you don't have enouth directly 
connected
+   nic, you can use vlan.
+2. Dom0 requirements
+   - Support dom0
+   - kernel module:
+        sch_ingress
+        cls_basic
+        cls_tcindex
+        cls_u32
+        act_mirred
+   - libnl-tools >= 3.0. This package provides the command nl-qdisc-list, and
+     colo need this command.
         ^ COLO

+   - If your host os has OEM-released xen tools, please uninstall it first.
                      ^OS                 ^ Xen

(and please fix other occurrences of wrong capitalisations as well)

OK


This is a very broad statement and it is not very helpful from both
developers and users' point of view. Can you elaborate on what
functionalities that COLO needs to have exclusive access to?

+   - You can load the module which is not provided by OEM.

What does this mean?

Means you may need to compile the module yourself, will make it clear.


+3. Guest requirements
+   Only HVM guest(without pv extensions) is supported now. If you want to
+   use OEM released guest os, please use SUSE. REDHAT and Ubuntu is not
+   supported now because I don't find any way to disable pv extensions.
+   If you want to use REDHAT or Ubuntu, you need to build the newest
+   kernel which has the parameter xen_nopv.
+

FWIW, does "xen_platform_pci=0" in your xl.cfg work for RH and Ubuntu
guests?

It works only if you compile the newer kernel by yourself which support this
option.


+Network link topology
+   Please refer to: http://wiki.qemu.org/Features/COLO#Network_link_topology
+
+The steps to setup COLO environment:
+You need to recompile your host kernel because colo-proxy module need cooperate
+with linux kernel.
+Please refer to: http://wiki.qemu.org/Features/COLO#Test_environment_prepare
+1. Build and install xen
+2. Apply the patch for qemu xen, and rebuild xen tools:
+    - cd tools/qemu-xen-dir
+    - use git am to apply the patch:
+      
https://raw.githubusercontent.com/wencongyang/colo-files/master/patch_for_qemu/*.patch
+    - make tools && make install-tools
+    Note: You must use qemu-xen. qemu-xen-traditional is not supported.

Note that you will eventually need to upstream your changes to QEMU.

Sure, we have already posted the block patches to QEMU. It is under review now.
http://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg00399.html


+3. Install COLO proxy module:
+    3.1 Download COLO proxy, compile and install it:
+        https://github.com/gao-feng/colo-proxy.git
+    3.2 Download iptables patch, it is based on v1.4.21 compile and install it:
+        
https://github.com/gao-feng/colo-proxy/blob/master/colo-patch-for-kernel.patch
+4. Install the guest
+    4.1 Add "xen_platform_pci=0" into the guest configfile
+    4.2 If you use suse, please select physical machine
+    4.3 copy the disk image to the secondary host
+5. Update your guest config file for COLO:
+    5.1 disk
+        disk = [
+        
'format=raw,devtype=disk,access=w,vdev=hda,backendtype=qdisk,colo,colo-params=192.168.3.1:9000:exportname=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/root/images/colo-hvm.img'
 ]

It's unclear which parts are updated compared to the original config,
i.e. can you list the additional bits to enable COLO? Presumably it's
only those options starting with "colo"?

mostly, will update the doc to make it clear.


+    5.2 nic
+        vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, 
forwarddev=eth0, forwardbr=br1' ]
+    Note:
+    a. The ip/port in colo-params is the secondary host's IP. Don't use the
+       directly connected nic's IP.
+    b. forwarddev is the directly connected nic.
+    c. If you have more than one disk, colo-params's host/port must be the same
+       and colo-param's exportname must be different.
+6. Run COLO:
+    xl remus -c -u <domname> <secondary host IP>
+    Note: The ip must not be the directly connected nic's IP.
+Note:
+Secondary host only need to do step 1-3.
+
+The known problem:
+1. Secondary vm may crash due to triple fault.
+2. The heartbeat is not reliable. If you want to test the performance,
+   please disable the heartbeat(modify the xen codes). You can use the
+   branch colo-v4-noheartbeat.
+3. Suspending the vm fails, and the error message is:
+    libxl: error: libxl_qmp.c:429:qmp_next: timeout
+
+Problem 1 and 3 don't happen every time. So you can run colo again to
+avoid this problem.
+
+Virtio-Net:
+1. If you want to get better performance, you can use virtio-net.
+
+Trouble shooting:
+If there's some error happend when staritng COLO, you can do:
+1. Make sure you have all necessary modules that DOM0 needed on both side.
+2. Make sure you have followed all the instructions in this README.
+3. Try to reboot both primary and secondary host.
+4. If you still have problems, collect the error logs and contact
+   Wen Congyang(wency@xxxxxxxxxxxxxx)/Yang Hongyang(yanghy@xxxxxxxxxxxxxx).

After reading this whole document I think it should be a wiki page
instead of an in-tree README.

Agreed.


Wei.

--
1.9.1
.


--
Thanks,
Yang.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.