Xen project Mailing List

Re: [Xen-devel] Remus : VM on backup not in pause state

To: Dulloor <dulloor@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx

Date: Mon, 26 Jul 2010 23:17:52 -0700

Cc:

Delivery-date: Mon, 26 Jul 2010 23:18:28 -0700

Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=JqXc5bY1oBDOQ9ihorHo8VhWh2eGFo24Sse6KmsXb7l9366/uQO3lsxSmA8wL84JTj DS2HeEwIAAGywplZM4bnuJP8Fc0y4PIcOSBRdU0DP1+ycu8jeLxK7ZnB9koonekrnAXM YaAFWuQYjYvtEBgp1ANQDYuBgfZl6rZhd5TQY=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thanks for the pointers. I haven't had time to work on this. I will collect more data and get back as soon as I can. -dulloor On Mon, Jul 26, 2010 at 3:05 PM, Brendan Cully <brendan@xxxxxxxxx> wrote: > On Thursday, 22 July 2010 at 16:40, Dulloor wrote: >> On Thu, Jul 22, 2010 at 2:49 PM, Brendan Cully <brendan@xxxxxxxxx> wrote: >> > On Thursday, 22 July 2010 at 13:45, Dulloor wrote: >> >> My setup is as follows : >> >> - xen : unstable (rev:21743) >> >> - Dom0 : pvops (branch : stable-2.6.32.x, >> >> rev:01d9fbca207ec232c758d991d66466fc6e38349e) >> >> - Guest Configuration : >> >> ------------------------------------------------------------------------------------------ >> >> kernel = "/usr/lib/xen/boot/hvmloader" >> >> builder='hvm' >> >> name = "linux-hvm" >> >> vcpus = 4 >> >> memory = 2048 >> >> vif = [ 'type=ioemu, bridge=eth0, mac=00:1c:3e:17:22:13' ] >> >> disk = [ 'phy:/dev/XenVolG/hvm-linux-snap-1.img,hda,w' ] >> >> device_model = '/usr/lib/xen/bin/qemu-dm' >> >> boot="cd" >> >> sdl=0 >> >> vnc=1 >> >> vnclisten="0.0.0.0" >> >> vncconsole=0 >> >> vncpasswd='' >> >> stdvga=0 >> >> superpages=1 >> >> serial='pty' >> >> ------------------------------------------------------------------------------------------ >> >> >> >> - Remus command : >> >> # remus --no-net linux-hvm <dst-ip> >> >> >> >> - On primary : >> >> # xm list >> >> Name ID Mem VCPUs State >> >> Time(s) >> >> linux-hvm 9 2048 4 -b-s-- >> >> 10.8 >> >> >> >> - On secondary : >> >> # xm list >> >> Name ID Mem VCPUs State >> >> Time(s) >> >> linux-hvm 11 2048 4 -b---- >> >> 1.9 >> >> >> >> >> >> I have to issue "xm pause/unpause" explicitly for the backup VM. >> >> Any recent changes ? >> > >> > This probably means there was a timeout on the replication channel, >> > interpreted by the backup as a failure of the primary, which caused it >> > to activate itself. You should see evidence of that in the remus >> > console logs and xend.log and daemon.log (for the disk side). >> > >> > Once you've figured out where the timeout happened it'll be easier to >> > figure out why. >> > >> Please find the logs attached. I didn't find anything interesting in >> daemon.log. >> What does remus log there ? I am not using disk replication, since I >> have issues with that .. but that's for another email :) > > daemon.log is just for disk replication, so if you're not using it you > won't see anything. > >> The only visible error is in xend-secondary.log around xc_restore : >> [2010-07-22 16:15:37 2056] DEBUG (balloon:207) Balloon: setting dom0 target >> to 5 >> 765 MiB. >> [2010-07-22 16:15:37 2056] DEBUG (XendDomainInfo:1467) Setting memory target >> of >> domain Domain-0 (0) to 5765 MiB. >> [2010-07-22 16:15:37 2056] DEBUG (XendCheckpoint:290) [xc_restore]: >> /usr/lib/xen >> /bin/xc_restore 5 1 5 6 1 1 1 0 >> [2010-07-22 16:18:42 2056] INFO (XendCheckpoint:408) xc: error: Error >> when reading pages (11 = Resource temporarily unavailabl): Internal >> error >> [2010-07-22 16:18:42 2056] INFO (XendCheckpoint:408) xc: error: error >> when buffering batch, finishing (11 = Resource temporarily >> unavailabl): Internal error >> >> If you haven't seen this before, please let me know and I will try >> debugging more. > > I haven't seen that. It looks like read_exact_timed has failed with > EAGAIN, which is surprising since it explicitly looks for EAGAIN and > loops on it. Can you log len and errno after line 77 in > read_exact_timed in tools/libxc/xc_domain_restore.c? ie change > > if ( len <= 0 ) > return -1; > > to something like > > if ( len <= 0 ) { > fprintf(stderr, "read_exact_timed failed (read rc: %d, errno: %d)\n", > len, errno); > return -1; > } > > Another possibility is read is returning 0 here (and EAGAIN is just a > leftover errno from a previous read), which would indicate that the > _sender_ hung up the connection. It's hard to tell exactly what's > going on because you seem to have an enormous amount of clock skew > between your primary and secondary dom0s and I can't tell whether the > logs match up. > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.