[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Remus blktap2 issue


  • To: Jonathan Kirsch <kirsch.jonathan@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Shriram Rajagopalan <rshriram@xxxxxxxxx>
  • Date: Wed, 8 Sep 2010 12:43:54 -0700
  • Cc:
  • Delivery-date: Wed, 08 Sep 2010 12:44:52 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=F1yMGA8EdW7Re/z7raBjojZiWDFbsdI0JY2mBXHhtbEjimXsYfoejFOL992NtRqHsM PaJ1pUNe+KDPJK19hUc72gFEhIVW2qaXCbWvxEGtoJc6mfZqtgF4geWiClXkpOnbUVLB ryhXAvFg00fO5a5N/CE2OSyz2PCDWaPAq0tWk=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>



On Wed, Sep 8, 2010 at 11:33 AM, Jonathan Kirsch <kirsch.jonathan@xxxxxxxxx> wrote:
Hi,

Thanks a lot for the patch.  Unfortunately, this did not solve the problem for me (after applying the patch on both primary and backup, rebuilding and installing xen/tools/stubdom, and then rebooting both hosts).  The backup is still unable to create the disk device when the fail-over occurs.  Thus, although I see checkpoint traffic flowing from primary to backup, the state of the backup's disk image is never modified (as judged by the image's last-modified time).  The backup does switch from "paused" to "running," but it consumes 100% CPU and when I connect to its vnc console it is as if the VM is frozen.  So *something* is being transferred, because I do see the screen from the primary, but obviously all is not right, because I can't interact with it at all.

Are there any error messages in the Backup machine's syslog (or equivalent), about the tapdisks being used for the VM?

Are there error messages in the /var/log/xen/xend.log in Backup machine ?
Out of curiosity, in your working Remus deployment, which dom0 kernel are you running (and which version of Xen)?  I'm running Xen 4.0.1 and the pvops 2.6.31.14 dom0 kernel.  My understanding was that Remus supported pvops dom0 2.6.31.x. 

I am running Xen 4.0.1 with pvops 2.6.32.18. But I have not run any HVMs on remus on my setup yet. So, if your current setup is able to run HVM domUs (without remus) and you are also able to "live" migrate HVM domUs between the two machines, then the issue is somewhere else IMO.
Any other ideas regarding what this might be a symptom of?  My naive interpretation is that it is not a networking configuration problem (since state is being transferred), but that it has something to do with setting up the tapdisk via tapdisk2.
  
Thanks,
Jon 

On Wed, Sep 8, 2010 at 1:50 AM, Shriram Rajagopalan <rshriram@xxxxxxxxx> wrote:
Its not just the tap2:remus:....

there is a bug lurking in the in tools/python/xen/remus/device.py in ReplicatedDisk class. The regular _expression_ scans the domU config for only tap:tapdisk:remus... or tap:remus.. disk types only. I was able to get it working by fixing that regexp.
This applies for xen 4.0.1 only. Am not sure about xen unstable.
 Here is a patch that might be of help to you (its rather crude but heck I was too lazy :) )
diff -r b536ebfba183 tools/python/xen/remus/device.py
--- a/tools/python/xen/remus/device.py  Wed Aug 25 09:22:42 2010 +0100
+++ b/tools/python/xen/remus/device.py  Fri Sep 03 08:47:13 2010 -0700
@@ -36,10 +36,13 @@
         # to request commits.
         self.ctlfd = None
 
-        if not disk.uname.startswith('tap:remus:') and not disk.uname.startswith('tap:tapdisk:remus:'):
+        if not disk.uname.startswith('tap2:remus:') and not disk.uname.startswith('tap:remus:') and not disk.uname.startswith('tap:tapdisk:remus:'):
             raise ReplicatedDiskException('Disk is not replicated: %s' %
                                         str(disk))
-        fifo = re.match("tap:.*(remus.*)\|", disk.uname).group(1).replace(':', '_')
+        if disk.uname.startswith('tap2:remus:'):           
+            fifo = re.match("tap2:.*(remus.*)\|", disk.uname).group(1).replace(':', '_')
+        else:
+            fifo = re.match("tap:.*(remus.*)\|", disk.uname).group(1).replace(':', '_')
         absfifo = os.path.join(self.FIFODIR, fifo)
         absmsgfifo = absfifo + '.msg'



On Tue, Sep 7, 2010 at 11:01 PM, Pasi Kärkkäinen <pasik@xxxxxx> wrote:
On Tue, Sep 07, 2010 at 03:28:32PM -0700, Jonathan Kirsch wrote:
>    Hello,
>
>    I have been playing around with Remus on Xen 4.0.1, attempting to
>    fail-over for an HVM domU.
>
>    I've run into some problems that I think could be related to tapdisk2 and
>    its interaction with how one sets up Remus disk replication in the domU
>    config file.
>
>    A few things I've noticed:
>
>    -The tap:remus:backupHostIP:port|aio:imagePath notation does not work for
>    me, although this is what is written in the Remus documentation.  However,
>    I have found the following to work (i.e., not complain when starting
>    domU), so this is what I've been using:
>
>    tap2:remus:backupHostIP:port|aio:imagePath...
>

Yeah, this stuff was changed in Xen 4.0.1:
http://wiki.xensource.com/xenwiki/blktap2

I guess someone should update the remus wiki page.

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel



--
perception is but an offspring of its own self


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel




--
perception is but an offspring of its own self
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.