[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen 4.5-rc] remus-drbd incompatible with Linux 3.6+ headers



Hello,

å 12/19/2014 06:15 PM, Anthony Korzan åé:
Thank you for your response,

I compiled Linux 3.0.101, sch_plug, and reinstalled remus-drbd.  I still receive
the same error when starting remus:
xc: error: rdexact failed (select returned 0): Internal error
xc: error: Error when reading batch size (110 = Connection timed out):
Internal error
xc: error: error when buffering batch, finishing (110 = Connection timed
out): Internal error

The error occurs at an earlier plug/unplug with faster intervals.

I tried to reproduce the problem, but no luck, my test env seems pretty stable
until I unplug the primary's power.
xc: Saving memory: iter 2663 (last sent 231 skipped 0): 131072/131072  100%
xc: Saving memory: iter 2664 (last sent 228 skipped 0): 131072/131072  100%
Write failed: Broken pipe

I will send you the detailed configuration of my test environment.


I detailed my installation steps on my crude blog I just made, hopefully it 
helps:
http://akorzan.com/dokuwiki/doku.php?id=xen:installation

For Linux I used 3.4 or 3.0 and added the necessary options in make menuconfig.
  For 3.0 I had to get a separate copy of sch_plug.

The only thing I had to differentiate from, is that for the DomU config, I had
to use ["phy:/dev/drbd1,w,xvda"] instead of ["drbd:ubuntu_vm_1,w,xvda"]


On a side note: Interestingly, after changing the Kernel from 3.4 to 3.0 and
installing an external version of sch_plug, provided on the Xen Wiki, pings to
the DomU don't display the delay the network buffering causes, but on longer
intervals you can feel the extra time the pings take to respond.  Weird.

Many Thanks,
Anthony

On Dec 18, 2014, at 9:09 PM, Hongyang Yang <yanghy@xxxxxxxxxxxxxx
<mailto:yanghy@xxxxxxxxxxxxxx>> wrote:

Hi,

å 12/19/2014 05:48 AM, Anthony Korzan åé:
Hello!

I have only managed to get Xen 4.5's Remus "working" on Linux Kernels less than
3.5. The provided remus-drbd, as detailed in docs/README.remus and available
from https://github.com/rshriram/remus-drbd will not compile with Linux Kernels
3.6 and above.

The DRBD you get from https://github.com/rshriram/remus-drbd is DRBD 8.3.11
and this version only compatible with Linux 3.0~3.4, see the table on this page:
http://www.drbd.org/download/mainline/

I'm afraid DRBD 8.3.11 is the only version that you can get Remus work on
currently. In the past, Remus disk replication based on blktap2, but blktap2
is getting deprecated I think, there's no maintainers nor patches recent years.

If you are interest, there's a new FT solution based on Remus:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

This solution use blktap2 as disk replication, and it has lots of patches to
get blktap2 work with xl.

Futhermore, we are working on a better solution on disk replication on both
Remus/COLO. COLO is supposed to get into Xen 4.6.


One of these errors is that remus-drbd uses a two argument version of the macro
kunmap_atomic found in include/linux/highmem.h
This was deprecated and is no longer included in any Kernels above 3.6.

"error: macro "kunmap_atomic" passed 2 arguments, but takes just 1"

Is there a patch available?  If not, what set up do the Remus devs use to test?
 I just need a "stable-ish" platform to modify remus on.


Now I did get Remus "working" on Linux 3.4, Ubuntu 14.04, and the custom
remus-drbd.  The issue I run into is that Remus only plugs and unplugs a few
hundred times until there is a "Connection timeout error."  It could be that I
am using an "old" linux kernel version without much Xen integration, but I'm
stumped about this error:

Can you try to use Linux 3.0 to see if the error still exists?
I will take a look on this problem to see if I can reproduce it.


###
...
xc: progress: Reloading memory pages: 895015/65536  1365%
xc: Saving memory: iter 1416 (last sent 568 skipped 0): 65536/65536  100%
...
xc: Saving memory: iter 1420 (last sent 567 skipped 0): 65536/65536  100%
xc: error: rdexact failed (select returned 0): Internal error
xc: error: Error when reading batch size (110 = Connection timed out): Internal
error
xc: error: error when buffering batch, finishing (110 = Connection timed out):
Internal error
migration target: Remus Failover for domain 5
libxl: error: libxl_utils.c:430:libxl_read_exactly: file/stream truncated
reading ipc msg header from domain 5 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 5
save/restore helper [-1] died due to fatal signal Broken pipe
libxl: warning: libxl_dom.c:2015:domain_suspend_done: Remus: Domain suspend
terminated with rc -3, teardown Remus devices...
Remus: Backup failed? resuming domain at primary.
xc: error: Dom 5 not suspended: (shutdown 0, reason 255): Internal error
libxl: error: libxl.c:505:libxl__domain_resume: xc_domain_resume failed for
domain 5: Invalid argument
###

Sincerely,
Anthony

--
Thanks,
Yang.


--
Thanks,
Yang.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.