|   xen-users
Re: [Xen-users] Re: Using Xen Virtualization Environment for	Development 
| | What kind of tcpdump reports , obtained on Dom0 or some other box on the LAN brings you you to this idea ?
 
 Wrong checksum offloading at DomU front end network driver happens ( in my experience with  RTL PCI Gigabit Ethernet 8110SC/8169 on SNV and OSOL,
 however RTL PCI-E Ethernet 8111SC works fine) , but not necessarily.
 
 
 > Virtualization Tip: Always disable checksumming on virtual ethernet devicesWhy always ?
 Boris.
 
 
 --- On Fri, 10/30/09, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx> wrote:
 
 From: Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx>
 Subject: [Xen-users] Re: Using Xen Virtualization Environment for Development and Testing  of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
 To: xen-devel@xxxxxxxxxxxxxxxxxxx, xen-users@xxxxxxxxxxxxxxxxxxx
 Cc: space.time.universe@xxxxxxxxx
 Date: Friday, October 30, 2009, 4:12 AM
 
 
 Dear All, I have googled something which may help to solve my problem. [Xen-devel] Network drop on domU (netfront: rx->offset: 0,	size: 4294967295)http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.htmlVirtualization Tip: Always disable checksumming on virtual ethernet deviceshttp://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devicesLet me try to work on it first. --  Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore 
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo 
Email: space.time.universe@xxxxxxxxx MSN: teoenming@xxxxxxxxxxx Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 
Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 3:53 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx>  wrote:
 Hi,
 I have reverted to the 2-node troubleshooting scenario. I have started node 1 and node 2.
 
 On node 1, I will try to bring up the ring of mpd for the 2 nodes using mpdboot and try to execute mpiexec. On node 2, I will capture the tcpdump messages on virtual network interface eth0.
 
 Please see attached PNG screenshots. They are numbered in sequence.
 
 Please check if there are any problems.
 
 Thank you.
 
 --
 
 On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx>  wrote:
 Dear All,
 Here are more virtual network interface eth0 kernel messages. Notice the "net eth0: rx->offset: 0" messages. Are they of significance?
 
 Node 1
 
 Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.253:1009 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
 Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
 Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.251:1000 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
 Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
 Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin)
 Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid yet
 Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545)
 Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 callbacks suppressed
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295
 
 Node 6
 
 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid yet
 Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805)
 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295
 
 Node 1 NFS Server Configuration
 
 [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports
 /home/enming/mpich2-install/bin        192.168.1.0/24(ro)
 
 Node 2 /etc/fstab Configuration Entry for NFS Client
 
 192.168.1.254:/home/enming/mpich2-install/bin    /home/enming/mpich2-install/bin    nfs    rsize=8192,wsize=8192,timeo=14,intr
 On Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@xxxxxxxxx>  wrote:
 Dear All,
 I have created a virtual high performance computing (HPC) cluster of 6 compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter.
 
 I am able to bring up the ring of mpd on the set of 6 compute nodes. However, I am consistently encountering the "(mpiexec 392): no msg recvd from mpd when	expecting ack of request" error.
 
 After much troubleshooting, I have found that there are Receive Errors (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines.
 
 Here is my PV guest configuration for node 1:
 
 [enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001
 name="enming-f11-pv-hpc-node0001"
 memory=512
 disk = ['phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w' ]
 vif = [ 'mac=00:16:3E:69:E9:11,bridge=eth0' ]
 vfb = [ 'vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd=' ]
 vncconsole=1
 bootloader = "/usr/bin/pygrub"
 #kernel = "/home/enming/fedora11/vmlinuz"
 #ramdisk = "/home/enming/fedora11/initrd.img"
 vcpus=2
 
 
 
 Will there be any problems with Xen networking for MPICH2 applications? Or it's just a fine-tuning exercise for Xen networking? I am using PV guests because PV guests have much higher performance than HVM guests.
 
 Here are my mpich-discuss mailing list threads:
 
 http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html
 
 http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html
 
 http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html
 
 http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html
 
 http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html
 
 Please advise on the RX-ERR.
 
 Thank you very much.
 
 --
 Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering)
 Alma Maters:
 (1) Singapore Polytechnic
 (2) National University of Singapore
 My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
 My Youtube videos: http://www.youtube.com/user/enmingteo
 Email: space.time.universe@xxxxxxxxx
 MSN: teoenming@xxxxxxxxxxx
 Mobile Phone (SingTel): +65-9648-9798
 Mobile Phone (Starhub Prepaid): +65-8369-2618
 Age: 31 (as at 30 Oct 2009)
 Height: 1.78 meters
 Race: Chinese
 Dialect: Hokkien
 Street: Bedok Reservoir Road
 Country: Singapore
 
-----Inline Attachment Follows-----
 
 
 | 
 
 _______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users | 
 |  |