[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Remus : VM on backup not in pause state


  • To: Dulloor <dulloor@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Dulloor <dulloor@xxxxxxxxx>
  • Date: Mon, 26 Jul 2010 23:17:52 -0700
  • Cc:
  • Delivery-date: Mon, 26 Jul 2010 23:18:28 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=JqXc5bY1oBDOQ9ihorHo8VhWh2eGFo24Sse6KmsXb7l9366/uQO3lsxSmA8wL84JTj DS2HeEwIAAGywplZM4bnuJP8Fc0y4PIcOSBRdU0DP1+ycu8jeLxK7ZnB9koonekrnAXM YaAFWuQYjYvtEBgp1ANQDYuBgfZl6rZhd5TQY=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thanks for the pointers. I haven't had time to work on this. I will
collect more data and get back as soon as I can.

-dulloor

On Mon, Jul 26, 2010 at 3:05 PM, Brendan Cully <brendan@xxxxxxxxx> wrote:
> On Thursday, 22 July 2010 at 16:40, Dulloor wrote:
>> On Thu, Jul 22, 2010 at 2:49 PM, Brendan Cully <brendan@xxxxxxxxx> wrote:
>> > On Thursday, 22 July 2010 at 13:45, Dulloor wrote:
>> >> My setup is as follows :
>> >> - xen : unstable (rev:21743)
>> >> - Dom0 : pvops (branch : stable-2.6.32.x,
>> >> rev:01d9fbca207ec232c758d991d66466fc6e38349e)
>> >> - Guest Configuration :
>> >> ------------------------------------------------------------------------------------------
>> >> kernel = "/usr/lib/xen/boot/hvmloader"
>> >> builder='hvm'
>> >> name = "linux-hvm"
>> >> vcpus = 4
>> >> memory = 2048
>> >> vif = [ 'type=ioemu, bridge=eth0, mac=00:1c:3e:17:22:13' ]
>> >> disk = [ 'phy:/dev/XenVolG/hvm-linux-snap-1.img,hda,w' ]
>> >> device_model = '/usr/lib/xen/bin/qemu-dm'
>> >> boot="cd"
>> >> sdl=0
>> >> vnc=1
>> >> vnclisten="0.0.0.0"
>> >> vncconsole=0
>> >> vncpasswd=''
>> >> stdvga=0
>> >> superpages=1
>> >> serial='pty'
>> >> ------------------------------------------------------------------------------------------
>> >>
>> >> - Remus command :
>> >> # remus --no-net linux-hvm <dst-ip>
>> >>
>> >> - On primary :
>> >> # xm list
>> >> Name                                        ID   Mem VCPUs      State   
>> >> Time(s)
>> >> linux-hvm                                    9  2048     4     -b-s--     
>> >> 10.8
>> >>
>> >> - On secondary :
>> >> # xm list
>> >> Name                                        ID   Mem VCPUs      State   
>> >> Time(s)
>> >> linux-hvm                                   11  2048     4     -b----     
>> >>  1.9
>> >>
>> >>
>> >> I have to issue "xm pause/unpause" explicitly for the backup VM.
>> >> Any recent changes ?
>> >
>> > This probably means there was a timeout on the replication channel,
>> > interpreted by the backup as a failure of the primary, which caused it
>> > to activate itself. You should see evidence of that in the remus
>> > console logs and xend.log and daemon.log (for the disk side).
>> >
>> > Once you've figured out where the timeout happened it'll be easier to
>> > figure out why.
>> >
>> Please find the logs attached. I didn't find anything interesting in
>> daemon.log.
>> What does remus log there ? I am not using disk replication, since I
>> have issues with that .. but that's for another email :)
>
> daemon.log is just for disk replication, so if you're not using it you
> won't see anything.
>
>> The only visible error is in xend-secondary.log around xc_restore :
>> [2010-07-22 16:15:37 2056] DEBUG (balloon:207) Balloon: setting dom0 target 
>> to 5
>> 765 MiB.
>> [2010-07-22 16:15:37 2056] DEBUG (XendDomainInfo:1467) Setting memory target 
>> of
>> domain Domain-0 (0) to 5765 MiB.
>> [2010-07-22 16:15:37 2056] DEBUG (XendCheckpoint:290) [xc_restore]: 
>> /usr/lib/xen
>> /bin/xc_restore 5 1 5 6 1 1 1 0
>> [2010-07-22 16:18:42 2056] INFO (XendCheckpoint:408) xc: error: Error
>> when reading pages (11 = Resource temporarily unavailabl): Internal
>> error
>> [2010-07-22 16:18:42 2056] INFO (XendCheckpoint:408) xc: error: error
>> when buffering batch, finishing (11 = Resource temporarily
>> unavailabl): Internal error
>>
>> If you haven't seen this before, please let me know and I will try
>> debugging more.
>
> I haven't seen that. It looks like read_exact_timed has failed with
> EAGAIN, which is surprising since it explicitly looks for EAGAIN and
> loops on it. Can you log len and errno after line 77 in
> read_exact_timed in tools/libxc/xc_domain_restore.c? ie change
>
>       if ( len <= 0 )
>            return -1;
>
> to something like
>
>   if ( len <= 0 ) {
>       fprintf(stderr, "read_exact_timed failed (read rc: %d, errno: %d)\n",
>       len, errno);
>       return -1;
>   }
>
> Another possibility is read is returning 0 here (and EAGAIN is just a
> leftover errno from a previous read), which would indicate that the
> _sender_ hung up the connection. It's hard to tell exactly what's
> going on because you seem to have an enormous amount of clock skew
> between your primary and secondary dom0s and I can't tell whether the
> logs match up.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.