[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Remus stopped with full protection


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: åéå <zhangninja@xxxxxxxxx>
  • Date: Tue, 19 Jan 2010 15:57:16 +0800
  • Delivery-date: Thu, 21 Jan 2010 07:50:07 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=OBDFhUcm8AnBIjBlzsUksmHdrdoQVPCWjUcVynqmKxDs/QhKSo+fJPgfLs7Wf3x6Jx /fnB4Yqm/lltgZPIpinbBq7AqoSROnFyiW8e3zW1Em0Tqx4/SFQabNTXFkYoUfVr1IYF OZ8+Dx3F2BKez+qi+nq1XFhZubSDXKyAyZJ/8=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Remus stopped with full protection.
Hi everyone:

I am trying to deploy Remus on two servers following the instruction on the official 

website http://dsg.cs.ubc.ca/remus/doc.html. I've succeeded runnig Remus using "remus -

-no-net myvm mybackuphost". I can see a continual stream of messages on the primary host 

and I can do failover even pulling the power plug using the shared storage DRBD. 

The problem occurs when I want to step further and try to add disk replication and 

network buffering as the official documentation said.

I let each of the physical hosts has an identical copy of the disk image available at the same path like /dev/vgxen/hvmsnap . 

Then,I create the hvm and do the remus without "--no-net":
[root@server2 img]# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  2046     8     r-----  14254.5
Rhel5-hvm1                                   1   512     1     r-----      0.6
[root@server1 img]# remus 1 server2
qemu logdirty mode: enable
Saving memory pages: iter 1   0%ERROR Internal error: Error when writing to state file 
(4a) (errno 104)
Save exit rc=1
qemu logdirty mode: disable
then the program stops.

The error is located at line1361 xc_domain_save.c. The errno 104 standard for "connection reset by peers".
I'm not sure whether there there are some mistake about the network socket.

I've been stunked by this for two weeks. Do I miss something or could anybody tell what I should do next.  

Thanks for any feedback and suggestion!


Following are some additional infomation:

I have updated xen-unstable to the lastest tip, so it includes patches against Remus 

like 'IMQ for linux','control tool', as well as 'Patch * of 2' deliveryed on 02 Dec.


Some details about my enviroment:
Two homogeneous servers each with 4 GB of memory and 160 GB of hard disk. 
Rhel5.3 x86_32 with kernel 2.6.18.8 compile against xen-unstable. 

I have this xen HVM:
  1 import os, re
  2 arch = os.uname()[4]
  3 if re.search('64', arch):
  4     arch_libdir = 'lib64'
  5 else:
  6     arch_libdir = 'lib'
  7 kernel = '/usr/lib/xen/boot/hvmloader'
  8 builder = 'hvm'
  9 memory = '512'
 10 disk = [ 'tap:remus:server1:9000|aio:/dev/vgxen/hvmsnap,hda,w']
 11 #disk = [ 'tap:aio:/home/ninja/img/hvm.img,hda,w','file:/home/ninja/img/CentOS-5.3-  

  i386-bin-DVD.iso,hdc:cdrom,r']
 12 #disk = [ 'tap:aio:/home/ninja/img/hvm.img,hda,w','file:/var/ftp/pub/rhel-server-5.  

  3-i386.iso,hdc:cdrom,r']
 13 #disk = [ 'phy:/dev/vgxen/hvm,hda,w','file:/var/ftp/pub/rhel-server-5.3-i386.iso,hd  

  c:cdrom,r']
 14 #disk = [ 'tap:aio:/home/ninja/img/hvm.img,hda,w','file:/var/ftp/pub/rhel-server-5.  

  3-i386-dvd.iso,hdc:cdrom,r']
 15 vif = ['type=ioemu, mac=00:1c:3e:17:22:13']
 16 #boot = 'dc'
 17 boot = 'c'
 18 name = 'Rhel5-hvm1'
 19 acpi = 1
 20 apic = 1
 21 device_model = '/usr/' + arch_libdir + '/xen/bin/qemu-dm'
 22 vnc=1
 23 vncdisplay=2
 24 sdl=0
 25 opengl=1
 26 vnclisten="0.0.0.0"
 27 vncpasswd=''
 28 serial='pty'

disk = [ 'tap:remus:server1:9000|aio:/dev/vgxen/hvmsnap,hda,w']
hvmsnap is a lvm snapshots using as the hvm disk path.

location of the error code:
1351                 if ( pagetype != 0 )
1352                 {
1353                     /* If the page is not a normal data page, write out any
1354                        run of pages we may have previously acumulated */
1355                     if ( run )
1356                     {
1357                         if ( ratewrite(io_fd, live,
1358                                        (char*)region_base+(PAGE_SIZE*(j-run)),
1359                                        PAGE_SIZE*run) != PAGE_SIZE*run )
1360                         {
1361                             ERROR("Error when writing to state file (4a)"   (here)
1362                                   " (errno %d)", errno);
1363                             goto out;
1364                         }
1365                         run = 0;
1366                     }
1367                 } 

 some other operation records: 
I also have tried using paravirtualized virtual machine as GuestOS, yet it is still failed with the same error.



--
--
Best Regards

Minjia Zhang

Email:zhangninja@xxxxxxxxx
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.