[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2 of 2 V6] libxl: Remus - xl remus command

On Fri, May 25, 2012 at 12:59 PM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
On Thu, 2012-05-17 at 20:48 +0100, Shriram Rajagopalan wrote:
> diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1
> --- a/docs/man/xl.pod.1       Thu May 17 12:37:07 2012 -0700
> +++ b/docs/man/xl.pod.1       Thu May 17 12:37:10 2012 -0700
> @@ -381,6 +381,41 @@
>  =back
> +=item B<remus> [I<OPTIONS>] I<domain-id> I<host>
> +
> +Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
> +mechanism between the two hosts.
> +
> [...]

> +
> +=item B<-b>
> +
> +Do not checkpoint the disk. Replicate memory checkpoints to /dev/null
> +(blackhole).  Network output buffering remains enabled (unless --no-net is
> +supplied).  Generally useful for debugging.

Unless I'm mistaken the current remus support in (lib)xl doesn't
implement either disk or networking replication (and --no-net doesn't
seem to exist), at least there as several TODOs to that effect in the

Please can you send an incremental patch which corrects this.

I also think it would be worth mentioning in the intro that "xl remus"
as it stands is "proof-of-concept" or "early preview", "experimental" or
something along these lines, otherwise people will expect it to be a
complete solution, which it isn't.

Sorry about that. I ll send out a patch. I had actually planned on some
network buffering support but didnt expect the initial framework patches
to get held up for so long. :(. In fact, even the network buffering module is
has been available in mainline kernel (with libnl library support), for the past 3 months.
But I guess its too late now.

More importantly I think the lack of STONITH functionality should be
highlighted, since it would be rather dangerous to deploy remus without

I think this applies to both xend/xl. Remus traditionally has not had any
stonith functionality. And if you think about it, separating Remus from the
Failover Arbitration (STONITH) gives more flexibility
(e.g., kill Backup, in case replication was interrupted by some spurious timeout,
use custom or off-the-shelf stonith solutions, etc).

The only thing that was lacking is some sort of notification to an external handler.
For e.g., on suspected failure, both nodes could invoke some FooBar.sh script which
would return 0/1 (die/live) and act accordingly.  The onus is on the user who implements
the FooBar.sh script, to ensure that it doesnt return 1 on both sides. :).

In fact, I think I have a patch lying around somewhere, that invokes an arbitration
script, which in turn talks to a Google App engine instance. This was done for
wide-area Remus paper.

Let me post that too.



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.