[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2 of 2 V6] libxl: Remus - xl remus command
On Mon, 2012-05-28 at 01:39 +0100, Shriram Rajagopalan wrote: > On Fri, May 25, 2012 at 12:59 PM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> > wrote: > On Thu, 2012-05-17 at 20:48 +0100, Shriram Rajagopalan wrote: > > diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1 > > --- a/docs/man/xl.pod.1 Thu May 17 12:37:07 2012 -0700 > > +++ b/docs/man/xl.pod.1 Thu May 17 12:37:10 2012 -0700 > > @@ -381,6 +381,41 @@ > > > > =back > > > > +=item B<remus> [I<OPTIONS>] I<domain-id> I<host> > > + > > +Enable Remus HA for domain. By default B<xl> relies on ssh as a > transport > > +mechanism between the two hosts. > > + > > > [...] > > > + > > +=item B<-b> > > + > > +Do not checkpoint the disk. Replicate memory checkpoints to > /dev/null > > +(blackhole). Network output buffering remains enabled (unless > --no-net is > > +supplied). Generally useful for debugging. > > > Unless I'm mistaken the current remus support in (lib)xl doesn't > implement either disk or networking replication (and --no-net doesn't > seem to exist), at least there as several TODOs to that effect in the > code. > > Please can you send an incremental patch which corrects this. > > I also think it would be worth mentioning in the intro that "xl remus" > as it stands is "proof-of-concept" or "early preview", "experimental" > or > something along these lines, otherwise people will expect it to be a > complete solution, which it isn't. > > > Sorry about that. I ll send out a patch. Thanks. > I had actually planned on some > network buffering support but didnt expect the initial framework patches > to get held up for so long. :(. In fact, even the network buffering module is > has been available in mainline kernel (with libnl library support), for the > past 3 months. > But I guess its too late now. Yes, I'm afraid so, although that needn't stop you posting RFCs for 4.3. > > More importantly I think the lack of STONITH functionality should be > highlighted, since it would be rather dangerous to deploy remus > without > it. > heart > > > I think this applies to both xend/xl. Remus traditionally has not had any > stonith functionality. And if you think about it, separating Remus from the > Failover Arbitration (STONITH) gives more flexibility > (e.g., kill Backup, in case replication was interrupted by some spurious > timeout, > use custom or off-the-shelf stonith solutions, etc). So it sound like some documentation is required for what you need to build around the xm/xl remus support in order to have a fully functional & safe system? Does anything like that exist? It doesn't seem to be mentioned in http://nss.cs.ubc.ca/remus/doc.html. Could we add something into the tree or at least add a pointer to something? The need for this should also be highlighted in the xl man page I think, otherwise people will think that all they need to do is run "xl remus", which they could be forgiven for thinking after having read http://nss.cs.ubc.ca/remus/doc.html. > The only thing that was lacking is some sort of notification to an external > handler. > For e.g., on suspected failure, both nodes could invoke some FooBar.sh script > which > would return 0/1 (die/live) and act accordingly. The onus is on the user who > implements > the FooBar.sh script, to ensure that it doesnt return 1 on both sides. :). > > In fact, I think I have a patch lying around somewhere, that invokes an > arbitration > script, which in turn talks to a Google App engine instance. This was done > for > wide-area Remus paper. > > Let me post that too. Please. Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |