WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] hellp!! live migration fails after several migrations - stre

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] hellp!! live migration fails after several migrations - stress test
From: Michael Mey <michael.mey@xxxxxx>
Date: Mon, 28 Nov 2005 11:50:26 +0100
Delivery-date: Mon, 28 Nov 2005 10:50:35 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Organization: Thinking Objects Software GmbH
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.8.3
Hi folks,

I wanted to do a stress test on a domU (Debian 3.1, 2.6.11.12-xenU).
running on dom0 (Debian 3.1, 2.6.11.12-xen0), Xen-2.0.7, IA32.

Similar problems occured to me running Xen-devel from last week. Since it is 
under heavy development, I thought testing it with Xen-2.0.7 stable should 
work for sure.

Scenario:
2 of theses hosts (IPs: 192.168.112.17 and 192.168.112.18) are running a 
script. 
The script checks if the domain is on the host, if yes, it will be migrated to 
the other one, if no, the script will sleep for a few minutes.
It's kind of a domU-pingpong.

domU has a file-backed VBD lying on a 3rd host (NFS Server).

It worked for several migrations, but suddenly, the domU died.
Hopefully, someone could help me find out what the problem is. 

script pingpong:
----------------------------------------------------------------------------------
SCRIPTNAME=`basename $0`
HOSTNAME=`hostname`
LOGFILE=$SCRIPTNAME.$HOSTNAME.log
XENHOST1="192.168.112.17"
XENHOST2="192.168.112.18"
XENGUEST1="debian1"


# determine the xen target host
MYIP=`ifconfig eth0 | grep "inet" | awk '{print $2}' | cut -d: -f 2`
if [ $MYIP = $XENHOST1 ]
then
   TARGETHOST=$XENHOST2
else
   TARGETHOST=$XENHOST1
fi


log() {
   echo `date '+[%y/%m/%d %H:%M:%S]'` - $1 >> $LOGFILE
}


while [ 1 -eq 1 ]
do 
   RUNNINGGUESTS=`xm list | grep -v Name | grep -v Domain-0 | awk '{print 
$1}'`
   if `echo $RUNNINGGUESTS | grep $XENGUEST1 > /dev/null`
   then
      XENGUESTID=`xm list | grep $XENGUEST1 | awk '{print $2}'`
      log "domain $XENGUEST1 (ID=$XENGUESTID) is running here"
      if `xm list | grep $XENGUEST1 | awk '{print $5}' | grep p > /dev/null`
      then
         log "domain $XENGUEST1 is currently paused and hence cannot be 
migrated!"
      else
         log "START migration of $XENGUEST1 to $TARGETHOST..."
         if `xm migrate -l $XENGUEST1 $TARGETHOST`
         then
            log "END migration of domain $XENGUEST1 successfull!"
         else
            log "ERROR while migrating $XENGUEST1!"
         fi
      fi
   else
      log "domain $XENGUEST1 is not here. I'll go back to sleep."
   fi
   sleep `expr  $RANDOM % 1200 + 120`
done
-----------------------------------------------------------------------------------------

logfile snippets of the last migration (from 192.168.112.18 to 192.168.112.17:

192.168.112.17:
--------------------
xend.log:
------------
[2005-11-28 01:14:30 xend] DEBUG (blkif:203) Connecting blkif to event channel 
<BlkifBackendInterface 24 0> ports=16:4
[2005-11-28 01:14:30 xend] DEBUG (XendDomain:244) XendDomain>reap> domain died 
name=debian1 id=24
[2005-11-28 01:14:30 xend] INFO (XendDomain:568) Destroying domain: 
name=debian1
[2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:665) Destroying vifs for 
domain 24
[2005-11-28 01:14:30 xend] DEBUG (netif:305) Destroying vif domain=24 vif=0
[2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:674) Destroying vbds for 
domain 24
[2005-11-28 01:14:30 xend] DEBUG (blkif:552) Destroying blkif domain=24
[2005-11-28 01:14:30 xend] DEBUG (blkif:408) Destroying vbd domain=24 idx=0
[2005-11-28 01:14:30 xend] DEBUG (blkif:408) Destroying vbd domain=24 idx=1
[2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:634) Closing console, domain 
24
[2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:622) Closing channel to 
domain 24
[2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.virq 4
[2005-11-28 01:14:30 xend] DEBUG (blkif:363) Unbinding vbd (type file) 
from /dev/loop0
[2005-11-28 01:14:30 xend] DEBUG (blkif:363) Unbinding vbd (type file) 
from /dev/loop1
[2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.domain.exit 
['debian1', '24', 'crash']
[2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy 
['debian1', '24']
[2005-11-28 01:14:31 xend] INFO (XendRoot:113) EVENT> xend.domain.died 
['debian1', '24']
---------------------
xfrd.log:
------------
3488 [INF] XFRD> Forked child pid=5047
3488 [INF] XFRD> Accepted connection from 192.168.112.18:1777 on 2
5069 [INF] XFRD> Xfr service for 192.168.112.18:1777
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.xfr 16)[DEBUG] Conn_sxpr< err=0
[1133136756.095291] xc_linux_restore start

xc_linux_restore start
[1133136756.108828] Created domain 24

Created domain 24
(Domain-0 Domain-24)'domain id=24 name=debian1 memory=256 console=9624 
image=/boot/vmlinuz-2.6.11.12-xenU'[1133136756.570952] Reloading memory 
pages:   0%
Reloading memory pages:   6%                                                    
                                                                      
337%[1133136870.173691] Received all pages

Received all pages                                                              
                                                                      
100%
                                                                                
                                                                      
100%
[1133136870.183801] Memory reloaded.

Memory reloaded.
Decreased reservation by 7 pages
[1133136870.184758] Domain ready to be built.

Domain ready to be built.
[1133136870.184844] Domain ready to be unpaused

Domain ready to be unpaused
[1133136870.185234] DOM=24

DOM=24
[DEBUG] Conn_sxpr>
(xfr.err 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.err 0)[DEBUG] Conn_sxpr< err=0
5069 [INF] XFRD> Transfer complete in 114 seconds
5069 [INF] XFRD> Xfr service err=0

192.168.112.18:
----------------------------------------------
xend.log:
------------
[2005-11-28 01:14:16 xend] DEBUG (XendDomain:487) domain_restart_schedule> 16 
suspend 1
[2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.domain.shutdown 
['debian1', '16', 'suspend']
[2005-11-28 01:14:16 xend] DEBUG (XendDomain:244) XendDomain>reap> domain died 
name=debian1 id=16
[2005-11-28 01:14:16 xend] DEBUG (XendDomain:247) XendDomain>reap> shutdown 
id=16 reason=suspend
[2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.virq 4
[2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.domain.suspended 
['debian1', '16']
[2005-11-28 01:14:25 xend] INFO (XendDomain:568) Destroying domain: 
name=debian1
[2005-11-28 01:14:25 xend] DEBUG (XendDomainInfo:665) Destroying vifs for 
domain 16
[2005-11-28 01:14:25 xend] DEBUG (netif:305) Destroying vif domain=16 vif=0
[2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:674) Destroying vbds for 
domain 16
[2005-11-28 01:14:26 xend] DEBUG (blkif:552) Destroying blkif domain=16
[2005-11-28 01:14:26 xend] DEBUG (blkif:408) Destroying vbd domain=16 idx=0
[2005-11-28 01:14:26 xend] DEBUG (blkif:408) Destroying vbd domain=16 idx=1
[2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:634) Closing console, domain 
16
[2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:622) Closing channel to 
domain 16
[2005-11-28 01:14:26 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy 
['debian1', '16']
[2005-11-28 01:14:26 xend] DEBUG (blkif:363) Unbinding vbd (type file) 
from /dev/loop0
[2005-11-28 01:14:26 xend] DEBUG (blkif:363) Unbinding vbd (type file) 
from /dev/loop1
[2005-11-28 01:14:27 xend] INFO (XendDomain:568) Destroying domain: 
name=debian1
[2005-11-28 01:14:27 xend] DEBUG (XendDomainInfo:634) Closing console, domain 
16
[2005-11-28 01:14:27 xend] ERROR (XendDomainInfo:627) Domain destroy failed: 
debian1
Traceback (most recent call last):
  File 
"/usr/src/xen-2.0/dist/install/usr/lib/python/xen/xend/XendDomainInfo.py", 
line 625, in destroy_domain
    return xc.domain_destroy(dom=self.dom)
error: (3, 'No such process')
[2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy 
['debian1', '16']
[2005-11-28 01:14:27 xend] INFO (XendMigrate:345) Migrate OK: ['migrate', 
['id', '16'], ['state', 'ok'], ['live', 1], ['resource', 0], ['src', ['host', 
'testpc-018'], ['domain', '16']], ['dst', ['host', '192.168.112.17'], 
['domain', 24]]]
[2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.died 
['debian1', '16']
[2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.migrate 
['debian1', '16', 'ok', ['migrate', ['id', '16'], ['state', 'ok'], ['live', 
1], ['resource', 0], ['src', ['host', 'testpc-018'], ['domain', '16']], 
['dst', ['host', '192.168.112.17'], ['domain', 24]]]]

xfrd.log:
------------
 1: sent 37192, skipped 28339, delta 27582ms, dom0 44%, target 4%, sent 
44Mb/s, dirtied 59Mb/s 49812 pages
[1133136779.371602] Saving memory pages: iter 2   0%
 2: sent 38822, skipped 10990, %                                                
                                                     
96%
 2: sent 38822, skipped 10990, delta 18590ms, dom0 33%, target 5%, sent 
68Mb/s, dirtied 50Mb/s 28781 pages
[1133136797.962478] Saving memory pages: iter 3   0%
 3: sent 26495, skipped 2285, 7%                                                
                                                     
92%
 3: sent 26495, skipped 2285, delta 9687ms, dom0 36%, target 8%, sent 89Mb/s, 
dirtied 62Mb/s 18477 pages
[1133136807.650291] Saving memory pages: iter 4   0%
 4: sent 16933, skipped 1544, 5%                                                
                                                     
96%
 4: sent 16933, skipped 1544, delta 6007ms, dom0 38%, target 9%, sent 92Mb/s, 
dirtied 63Mb/s 11599 pages
[1133136813.658026] Saving memory pages: iter 5   0%
 5: sent 10261, skipped 1338, 8%                                                
                                                     
99%
 5: sent 10261, skipped 1338, delta 3642ms, dom0 39%, target 9%, sent 92Mb/s, 
dirtied 72Mb/s 8112 pages
[1133136817.300454] Saving memory pages: iter 6   0%
 6: sent 6907, skipped 1205, 11%                                                
                                                     
87%
 6: sent 6907, skipped 1205, delta 2451ms, dom0 36%, target 17%, sent 92Mb/s, 
dirtied 82Mb/s 6143 pages
[1133136819.752019] Saving memory pages: iter 7   0%
 7: sent 5127, skipped 1011, 16%                                                
                                                     
99%
 7: sent 5127, skipped 1011, delta 1849ms, dom0 47%, target 13%, sent 90Mb/s, 
dirtied 173Mb/s 9795 pages
[1133136821.601429] Saving memory pages: iter 8   0%
 8: sent 8308, skipped 1487, 10%                                                
                                                     
98%
 8: sent 8308, skipped 1487, delta 4210ms, dom0 35%, target 7%, sent 64Mb/s, 
dirtied 86Mb/s 11162 pages
[1133136825.811665] Saving memory pages: iter 9   0%
 9: sent 9356, skipped 1805,  8%                                                
                                                     
98%
 9: sent 9356, skipped 1805, delta 4216ms, dom0 38%, target 18%, sent 72Mb/s, 
dirtied 98Mb/s 12663 pages
[1133136830.028020] Saving memory pages: iter 10   0%
 10: sent 11249, skipped 1410, 7%                                               
                                                     
90%
 10: sent 11249, skipped 1410, delta 8518ms, dom0 26%, target 5%, sent 43Mb/s, 
dirtied 74Mb/s 19253 pages
[1133136838.546941] Saving memory pages: iter 11   0%
 11: sent 17449, skipped 1803, 5%                                               
                                                     
99%
 11: sent 17449, skipped 1803, delta 10874ms, dom0 30%, target 4%, sent 
52Mb/s, dirtied 50Mb/s 16807 pages
[1133136849.421847] Saving memory pages: iter 12   0%
 12: sent 14596, skipped 2211, 6%                                               
                                                     
98%
 12: sent 14596, skipped 2211, [DEBUG] Conn_sxpr>
(debian1 16)[DEBUG] Conn_sxpr< err=0
[1133136856.172376] SUSPEND flags 00020004 shinfo 00000be8 eip c01068fe esi 
00035a84

SUSPEND flags 00020004 shinfo 00000be8 eip c01068fe esi 00035a84
delta 6750ms, dom0 39%, target 8%, sent 70Mb/s, dirtied 102Mb/s 21143 pages
[1133136856.172581] Saving memory pages: iter 13   0%
 13: sent 21143, skipped 0,    9%                                               
                                                     
97%
 13: sent 21143, skipped 0, delta 9709ms, dom0 22%, target 0%, sent 71Mb/s, 
dirtied 71Mb/s 21143 pages
[1133136865.882150] Total pages sent= 223838 (3.42x)

Total pages sent= 223838 (3.42x)
[1133136865.882212] (of which 0 were fixups)

(of which 0 were fixups)
[DEBUG] Conn_sxpr>
(xfr.err 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.xfr.ok 24)[DEBUG] Conn_sxpr< err=0
3555 [INF] XFRD> Transfer complete in 115 seconds
3555 [WRN] XFRD> Transfer OK
3555 [INF] XFRD> Xfr service err=0



Michael

-- 
----------------------------------------------------------------------------------------
Michael Mey                                  
Thinking Objects Software GmbH    |   mailto: michael.mey@xxxxxx 
Lilienthalstrasse 2/1                         |   phone: +49 711 88770-147
70825 Stuttgart-Korntal, Germany  |   fax: +49 711 88770-449
----------------------------------------------------------------------------------------

Attachment: pgpnBFZMyBsnF.pgp
Description: PGP signature

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] hellp!! live migration fails after several migrations - stress test, Michael Mey <=