[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Live migration with MPI


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: "Arun Babu" <arunbabu.n@xxxxxxxxx>
  • Date: Fri, 4 Aug 2006 13:36:15 -0400
  • Delivery-date: Fri, 04 Aug 2006 10:36:51 -0700
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=H+Gt7m6jf4YQT99qgbw1q2d9GchL0WoaHtv2eyvYqr7F96oYk4rCaz+nz+IdI4j9pK3vAPeTePUEJpwrFhoMIv1HwCHba1skOOmp8ZGymFIL0kXPAqqMJ7w15oqNXqPOFuMMWHsWNt7/9AyCxhE+XKH4QpvNMMxWWr06zrcvyb8=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi,
We have currently set up a 16 node cluster with xen3.0.2-3/Linux 2.6.16-13. We also have MPI setup and running on the cluster. I construct a ring of 4 machines with 3 real nodes and 1 virtual one and run an MPI application(a benchmark -smg2000) and it completes fine. Very nice.

Now while running the MPI benchmark on the ring, I try to live migrate the virtual machine. This produces a 'Kernel Bug' in the virtual machine with the dump pasted below. Also I am pasting the error thrown by the MPI benchmark application.(Seems like some kind of memory corruption while doing migration...)

Has anyone tried successfully doing a live migration while running an MPI application?
Could you please help me how to approach this?  (On seeing the glibc errors, i moved /lib64/tls to /lib64/tls.disabled. But no difference..)

Thank you,
Arun

1. Error message given by the virtual machine's console running and MPI.

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/mmap.c:1961
invalid opcode: 0000 [3] SMP
CPU 0
Modules linked in: ipv6 autofs4 i2c_dev i2c_core dm_mirror dm_mod lp  parport_pc parport
Pid: 4790, comm: smg2000 Not tainted 2.6.16.13-xen #7
RIP: e030:[<ffffffff8016a42b>] <ffffffff8016a42b>{exit_mmap+235}
RSP: e02b:ffff880012cddcd8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800021a42c0 RCX: 000000000000011d
RDX: ffffffffff578000 RSI: ffff88002b33e6b8 RDI: ffff88000116da80
RBP: 0000000000000000 R08: ffff8800395445b0 R09: 0000000000000000
R10: 0000000000000537 R11: ffffffff801dac20 R12: ffff880002700880
R13: 0000000000000001 R14: 0000000000000006 R15: ffffffff8010b45d
FS:  00002b1d2333e6f0(0000) GS:ffffffff80514000(0000) knlGS:00000000 00000000
CS:  e033 DS: 0000 ES: 0000
Process smg2000 (pid: 4790, threadinfo ffff880012cdc000, task ffff88 003f9828b0)
Stack: 0000000000002181 ffff8800021a42c0 ffff880002700880 ffff880002 700900
       ffff88003f982f1c ffffffff8012ef94 0000000000000006 0000000000 000006
       ffff88003f9828b0 ffffffff80135479
Call Trace: <ffffffff8012ef94>{mmput+52} <ffffffff80135479>{do_exit+ 521}
       <ffffffff8013deae>{__dequeue_signal+478} <ffffffff8010b45d>{s ysret_signal+56}
       <ffffffff80135c28>{do_group_exit+264} <ffffffff8014062c>{get_ signal_to_deliver+1708}
       <ffffffff8010b45d>{sysret_signal+56} <ffffffff8010a5ed>{do_si gnal+157}
       <ffffffff801378cb>{current_fs_time+59} <ffffffff803a3c62>{__d own_read+18}
       <ffffffff80129eec>{try_to_wake_up+924} <ffffffff80196864>{dpu t+84}
       <ffffffff8013d62c>{sigprocmask+220} <ffffffff8013ef23>{sys_rt _sigprocmask+99}
       <ffffffff8017b768>{filp_close+104} <ffffffff8013d62c>{sigproc mask+220}
       <ffffffff8010b45d>{sysret_signal+56} <ffffffff8010b735>{ptreg scall_common+61}

Code: 0f 0b 68 95 2b 3d 80 c2 a9 07 48 83 c4 10 5b 5d 41 5c c3 66
RIP <ffffffff8016a42b>{exit_mmap+235} RSP <ffff880012cddcd8>
 <1>Fixing recursive fault but reboot is needed!


2. Error thrown by the MPI benchmark application:
*** glibc detected *** smg2000: free(): invalid pointer: 0x00000000017ef1a0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2b1d23162c43]
/lib64/libc.so.6(__libc_free+0x84)[0x2b1d23162dc4]
smg2000[0x42b632]
smg2000[0x4289ee]
smg2000[0x41d261]
smg2000[0x405dee]
smg2000[0x4056a8]
smg2000[0x408aad]
smg2000[0x403c05]
smg2000[0x403730]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2b1d23111e54]
smg2000[0x402269]
======= Memory map: ========
00400000-004bd000 r-xp 00000000 00:15 33425271                           /nfsroot/home/abnagara/code/bm/smg2000/test/smg2000
005bc000-005be000 rw-p 000bc000 00:15 33425271                           /nfsroot/home/abnagara/code/bm/smg2000/test/smg2000
005be000-0180f000 rw-p 005be000 00:00 0                                  [heap]
36f8e00000-36f8e0d000 r-xp 00000000 00:0c 37044253                       /nfsroot/lib64/libgcc_s- 4.1.0-20060304.so.1
36f8e0d000-36f8f0d000 ---p 0000d000 00:0c 37044253                       /nfsroot/lib64/libgcc_s-4.1.0-20060304.so.1
36f8f0d000-36f8f0e000 rw-p 0000d000 00:0c 37044253                       /nfsroot/lib64/libgcc_s- 4.1.0-20060304.so.1
2b1d22c37000-2b1d22c51000 r-xp 00000000 00:0c 37044225                   /nfsroot/lib64/ld-2.4.so
2b1d22c51000-2b1d22c52000 rw-p 2b1d22c51000 00:00 0
2b1d22c73000-2b1d22c74000 rw-p 2b1d22c73000 00:00 0
2b1d22d50000-2b1d22d51000 r--p 00019000 00:0c 37044225                   /nfsroot/lib64/ld-2.4.so
2b1d22d51000-2b1d22d52000 rw-p 0001a000 00:0c 37044225                   /nfsroot/lib64/ld- 2.4.so
2b1d22d52000-2b1d22dd2000 r-xp 00000000 00:0c 37044256                   /nfsroot/lib64/libm-2.4.so
2b1d22dd2000-2b1d22ed2000 ---p 00080000 00:0c 37044256                   /nfsroot/lib64/libm- 2.4.so
2b1d22ed2000-2b1d22ed3000 r--p 00080000 00:0c 37044256                   /nfsroot/lib64/libm-2.4.so
2b1d22ed3000-2b1d22ed4000 rw-p 00081000 00:0c 37044256                   /nfsroot/lib64/libm- 2.4.so
2b1d22ed4000-2b1d22ee6000 r-xp 00000000 00:0c 37044273                   /nfsroot/lib64/libpthread-2.4.so
2b1d22ee6000-2b1d22fe6000 ---p 00012000 00:0c 37044273                   /nfsroot/lib64/libpthread- 2.4.so
2b1d22fe6000-2b1d22fe7000 r--p 00012000 00:0c 37044273                   /nfsroot/lib64/libpthread-2.4.so
2b1d22fe7000-2b1d22fe8000 rw-p 00013000 00:0c 37044273                   /nfsroot/lib64/libpthread- 2.4.so
2b1d22fe8000-2b1d22fec000 rw-p 2b1d22fe8000 00:00 0
2b1d22fec000-2b1d22ff3000 r-xp 00000000 00:0c 37044275                   /nfsroot/lib64/librt-2.4.so
2b1d22ff3000-2b1d230f2000 ---p 00007000 00:0c 37044275                   /nfsroot/lib64/librt-2.4.so
2b1d230f2000-2b1d230f3000 r--p 00006000 00:0c 37044275                   /nfsroot/lib64/librt- 2.4.so
2b1d230f3000-2b1d230f4000 rw-p 00007000 00:0c 37044275                   /nfsroot/lib64/librt-2.4.so
2b1d230f4000-2b1d230f5000 rw-p 2b1d230f4000 00:00 0
2b1d230f5000-2b1d23234000 r-xp 00000000 00:0c 37044234                   /nfsroot/lib64/libc-2.4.so
2b1d23234000-2b1d23334000 ---p 0013f000 00:0c 37044234                   /nfsroot/lib64/libc- 2.4.so
2b1d23334000-2b1d23338000 r--p 0013f000 00:0c 37044234                   /nfsroot/lib64/libc-2.4.so
2b1d23338000-2b1d23339000 rw-p 00143000 00:0c 37044234                   /nfsroot/lib64/libc- 2.4.so
2b1d23339000-2b1d23458000 rw-p 2b1d23339000 00:00 0
2b1d2348d000-2b1d2350f000 rw-p 2b1d2348d000 00:00 0
2b1d2352b000-2b1d2426d000 rw-p 2b1d2352b000 00:00 0
2b1d24300000-2b1d24321000 rw-p 2b1d24300000 00:00 0
2b1d24321000-2b1d24400000 ---p 2b1d24321000 00:00 0
7fffffd72000-7fffffd87000 rw-p 7fffffd72000 00:00 0                      [stack]ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.