On Fri, Apr 16, 2010 at 01:14:46PM +0200, Trygve Sanne Hardersen wrote:
> I've looked a bit into the Open vSwtich source code and it seems to me
> like MAC addresses can only be 6 bytes, but the IB addresses are 20 bytes.
> I'm also seeing this in the Open vSwitch log:
> |00043|bridge|INFO|created port ib0 on bridge brib0
> |00044|dpif|WARN|dp0: failed to add ib0 as port: Invalid argument
> |00045|bridge|ERR|failed to add ib0 interface to dp0: Invalid argument
> |00046|bridge|ERR|ib0 interface not in dp0, dropping
> |00047|bridge|ERR|ib0 port has no interfaces, dropping
> I've tried to report this on the Open vSwitch discuss list, but my
> messages do not seem to get through.
Did you subscribe to the list?
-- Pasi
> Thanks!
> Trygve
> On Wed, Apr 14, 2010 at 6:23 PM, Trygve Sanne Hardersen
> <[1]trygve@xxxxxxxxxxxxx> wrote:
>
> I've finally got to spend some time looking further into this.
> I now believe the underlaying problem is that Open vSwitch is unable to
> connect the brib0 bridge interface to the ib0 physical interface. I
> suspect the cause of this to be the long MAC address of the Infiniband
> NICs, but so far I have not found a workaround for the issue.
> These are the relevant devices for my setup:
> [root@hypoxcp1 ~]# ifconfig
> brib0 Link encap:Ethernet HWaddr 80:00:00:48:FE:80
> inet addr:10.1.2.2 Bcast:10.1.2.255 Mask:255.255.255.0
> inet6 addr: fe80::8200:ff:fe48:fe80/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:0 (0.0 b) TX bytes:720 (720.0 b)
> eth0 Link encap:Ethernet HWaddr 00:30:48:CC:5C:A4
> inet6 addr: fe80::230:48ff:fecc:5ca4/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:31755 errors:0 dropped:0 overruns:0 frame:0
> TX packets:10544 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:4284224 (4.0 MiB) TX bytes:1433336 (1.3 MiB)
> ib0 Link encap:InfiniBand HWaddr
> 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:10.1.2.102 Bcast:10.1.2.255 Mask:255.255.255.0
> UP BROADCAST MULTICAST MTU:2044 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:128
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> xenbr0 Link encap:Ethernet HWaddr 00:30:48:CC:5C:A4
> inet addr:10.1.1.2 Bcast:10.1.1.255 Mask:255.255.255.0
> inet6 addr: fe80::230:48ff:fecc:5ca4/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:25892 errors:0 dropped:0 overruns:0 frame:0
> TX packets:10538 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:3586527 (3.4 MiB) TX bytes:1432868 (1.3 MiB)
> The ifconfig command reports the wrong (or truncated) MAC address for
> the ib0 device. The real address can be found using other commands:
> [root@hypoxcp1 ~]# cat /sys/class/net/ib0/address
> 80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
> [root@hypoxcp1 ~]# ip link show ib0
> 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen
> 128
> link/infiniband
> 80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25 brd
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> As mentioned earlier in this thread I've had issues with duplicate MAC
> addresses in /etc/ovs-vswitchd.conf, but a clean install somehow fixed
> that issue, so the proper MAC address is now added to the file:
> [root@hypoxcp1 ~]# cat /etc/ovs-vswitchd.conf
>
> bridge.brib0.mac=80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
> bridge.brib0.port=brib0
> bridge.brib0.port=ib0
> bridge.brib0.port=vif1.2
> bridge.brib0.xs-network-uuids=6455dd7f-4a61-43b8-a49d-656f749c4ac6
> bridge.xenbr0.mac=00:30:48:cc:5c:a4
> bridge.xenbr0.port=eth0
> bridge.xenbr0.port=vif1.1
> bridge.xenbr0.port=xenbr0
> bridge.xenbr0.xs-network-uuids=528d85a4-f582-c181-54eb-acf09ac7dcf4
> bridge.xenbr1.mac=00:30:48:cc:5c:a5
> bridge.xenbr1.port=eth1
> bridge.xenbr1.port=vif1.0
> bridge.xenbr1.port=xenbr1
> bridge.xenbr1.xs-network-uuids=4f033ff5-5a56-629c-1c27-0765ba7c03bb
> I'm no expert on XCP and Open vSwitch, but I believe it works something
> like this:
>
> 1. XAPI writes /etc/ovs-vswitchd.conf based on the XCP DB
> 2. XAPI starts up Open vSwitch
> 3. Open vSwitch creates the interfaces defined
> in /etc/ovs-vswitchd.conf
>
> To me it seems like the MAC address for the brib0 interface
> is truncated, and I believe this causes Open vSwitch to not bind brib0
> and ib0 together:
> [root@hypoxcp1 ~]# ovs-ofctl show brib0
> Apr 14 15:38:31|00001|ofctl|INFO|connecting to unix:/var/run/brib0.mgmt
> features_reply (xid=0x6bb27f3f): ver:0x97, dpid:32f493d6e290
> n_tables:2, n_buffers:256
> features: capabilities:0x17, actions:0x3ff
> LOCAL(brib0): addr:80:00:00:48:fe:80, config: 0, state:0
> Apr 14 15:38:31|00002|ofctl|INFO|connecting to unix:/var/run/brib0.mgmt
> get_config_reply (xid=0x9b99aaf1): miss_send_len=0
> [root@hypoxcp1 ~]# ovs-ofctl show xenbr0
> Apr 14 15:38:19|00001|ofctl|INFO|connecting to unix:/var/run/xenbr0.mgmt
> features_reply (xid=0x836b0867): ver:0x97, dpid:f68bde598f51
> n_tables:2, n_buffers:256
> features: capabilities:0x17, actions:0x3ff
> 1(eth0): addr:00:30:48:cc:5c:a4, config: 0, state:0
> current: 1GB-FD COPPER AUTO_NEG
> advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER
> AUTO_NEG
> supported: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER
> AUTO_NEG
> LOCAL(xenbr0): addr:00:30:48:cc:5c:a4, config: 0, state:0
> Apr 14 15:38:19|00002|ofctl|INFO|connecting to unix:/var/run/xenbr0.mgmt
> get_config_reply (xid=0x2b665ea): miss_send_len=0
> As you see the binding to ib0 is missing, and the MAC of brib0 is
> different from that in /etc/ovs-vswitchd.conf.
> As previously stated I can communicate between XCP hosts on both brib0
> and ib0 using this setup. The problem is that VIFs on the brib0 network
> are not reachable. I have the following IB interfaces on a single host:
> ib0 - [2]10.1.2.2/24
> brib0 - [3]10.1.2.102/24
> vif1.2 - [4]10.1.2.202/24
> From within the VM that uses vif1.3 I try to ping brib0 and ib0 and
> watch the traffic on the XCP host:
> [root@hypoxcp1 ~]# tcpdump -i vif1.2
> tcpdump: WARNING: vif1.2: no IPv4 address assigned
> tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> listening on vif1.2, link-type EN10MB (Ethernet), capture size 96 bytes
> 15:54:59.948660 arp who-has 10.1.2.102 tell 10.1.2.202
> 15:55:00.948643 arp who-has 10.1.2.102 tell 10.1.2.202
> 15:55:01.948645 arp who-has 10.1.2.102 tell 10.1.2.202
> [root@hypoxcp1 ~]# tcpdump -i brib0
> tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> listening on brib0, link-type EN10MB (Ethernet), capture size 96 bytes
> 15:54:22.612723 arp who-has 10.1.2.102 tell 10.1.2.202
> 15:54:23.612643 arp who-has 10.1.2.102 tell 10.1.2.202
> 15:54:24.612642 arp who-has 10.1.2.102 tell 10.1.2.202
> [root@hypoxcp1 ~]# tcpdump -i ib0
> tcpdump: WARNING: arptype 32 not supported by libpcap - falling back to
> cooked socket
> tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> listening on ib0, link-type LINUX_SLL (Linux cooked), capture size 96
> bytes
> The packets never reach ib0.
> This setup adds the follow IP routes to the XCP host:
> [root@hypoxcp1 ~]# route -n
> Kernel IP routing table
> Destination Gateway Genmask Flags Metric Ref Use
> Iface
> 10.1.1.0 0.0.0.0 255.255.255.0 U 0 0
> 0 xenbr0
> 10.1.2.0 0.0.0.0 255.255.255.0 U 0 0
> 0 ib0
> 10.1.2.0 0.0.0.0 255.255.255.0 U 0 0
> 0 brib0
> 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0
> 0 brib0
> 0.0.0.0 10.1.1.1 0.0.0.0 UG 0
> 0 0 xenbr0
> If I remove the ib0 route I can talk to brib0 and ib0 from vif1.3, but
> only on the same physical machine. Inter-host and inter-vm over network
> communication breaks without that route.
> I also tried using "bridge" networking instead of "vswitch", but the
> system behaves the same way AFAICT, though the configuration is of
> course different.
> I'm not sure what to try next. I could use the IB network for the
> management interface and not run any VMs on it, but please let me know
> if you have any idea what's wrong.
> Thanks!
> Trygve
> On Thu, Apr 8, 2010 at 11:40 AM, Trygve Sanne Hardersen
> <[5]trygve@xxxxxxxxxxxxx> wrote:
>
> Hi
> Yes, I believe the packets are lost between brib0 and ib0, so they are
> never sent across the network but it works on a single host.
> I'll do some more testing and let you know what I find.
> Thanks!
> Trygve
>
> On Thu, Apr 8, 2010 at 10:40 AM, Dave Scott
> <[6]Dave.Scott@xxxxxxxxxxxxx> wrote:
>
> Hi,
>
>
>
> Is it true that you have managed to get VM <-> Host connectivity
> working but not VM <-> VM (across host) connectivity working?
>
>
>
> If so then it would be interesting to use something like tcpdump to
> find out where the packets are going missing. If they*re entering
> the vswitch and then getting lost then it would be worth talking
> about this on the openvswitch mailing list.
>
>
>
> Another possibility is to revert to non-vswitch based networking in
> dom0: try writing *bridge* to /etc/xensource/network.conf and
> rebooting.
>
>
>
> Cheers,
>
> Dave
>
>
>
> From: [7]xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:[8]xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> Trygve Sanne Hardersen
> Sent: 07 April 2010 23:02
> To: Xen
> Subject: [Xen-users] using ipoib with xcp
>
>
>
> Hello,
>
>
>
> I have been playing with the XCP for a while now, and must say I'm
> very exited about the technology. I had no prior experience with Xen
> so it has taken me a while to understand the concepts, but now I
> feel most important issues are solved and I've purchased some
> hardware to build my (tiny) cloud on.
>
>
>
> The box is a Supermicro 1026TT-IBXF, so I have 2 x Ethernet and 1 x
> Infiniband (IB) NICs per node. I want to use the IB NIC to provide
> fast connectivity between the domUs, while the Ethernet NICs will be
> used for the XCP management interface and ISP connectivity.
>
>
>
> I have successfully built OFED 1.5.1 in the XCP DDK VM and
> installed OFED in the XPC 0.1.1 dom0. From there I can bring up the
> IB network, but I'm having problems getting this to work properly
> within XCP virtual machines. This is what happens:
>
>
>
> Starting out I have 2 nodes in a pool; both are clean with only lo,
> eth0/xenbr0 and eth1/xenbr1 configured. I run the following commands
> to add the IB NICs to the pool:
>
>
>
> xe pif-scan host-uuid=NODE1
>
> xe pif-plug uuid=NODE1_IB0
>
> xe pif-scan host-uuid=NODE2
>
> xe pif-plug uuid=NODE2_IB0
>
>
>
> As expected this adds ib0/brib0 on both nodes and a single pool-wide
> network, but there is no connectivity between the hosts after I give
> brib0 an IP:
>
>
>
> xe pif-reconfigure-ip uuid=NODE1_IB0 IP=10.1.2.2
> netmask=255.255.255.0 mode=static
>
> xe pif-reconfigure-ip uuid=NODE2_IB0 IP=10.1.2.3
> netmask=255.255.255.0 mode=static
>
> ping 10.1.2.2 --> reply
>
> ping 10.1.2.3 --> destination host unavailable
>
>
>
> However if I also give ib0 an IP and use this as gateway for brib0,
> connectivity is achieved:
>
>
>
> ifconfig ib0 10.1.2.22 netmask 255.255.255.0
>
> xe pif-reconfigure-ip uuid= NODE1_IB0 IP=10.1.2.2
> netmask=255.255.255.0 gateway=10.1.2.22 mode=static
>
> ifconfig ib0 10.1.2.33 netmask 255.255.255.0
>
> xe pif-reconfigure-ip uuid= NODE2_IB0 IP=10.1.2.3
> netmask=255.255.255.0 gateway=10.1.2.33 mode=static
>
> ping 10.1.2.2 --> reply
>
> ping 10.1.2.3 --> reply
>
> ping 10.1.2.22 --> reply
>
> ping 10.1.2.33 --> reply
>
>
>
> This is very well, but when I add a VIF on the IB network to a VM it
> is not able to communicate through it:
>
>
>
> xe vif-create device=2 mac=random network-uuid=IB_NET
> vm-uuid=NODE1_IBVM
>
> ifconfig eth2 10.1.2.122 netmask 255.255.255.0
>
> xe vif-create device=2 mac=random network-uuid=IB_NET
> vm-uuid=NODE2_IBVM
>
> ifconfig eth2 10.1.2.133 netmask 255.255.255.0
>
> ping 10.1.2.122 --> reply
>
> ping 10.1.2.22 --> destination host unavailable
>
> ping 10.1.2.2 --> destination host unavailable
>
> ping 10.1.2.133 --> destination host unavailable
>
> ping 10.1.2.33 --> destination host unavailable
>
> ping 10.1.2.3 --> destination host unavailable
>
>
>
> I believe that the problem lies somewhere in the routing table
> configuration. This setup gives the following routing table:
>
>
>
> 10.1.2.0 0.0.0.0 255.255.255.0 U 0 0
> 0 ib0
>
> 10.1.2.0 0.0.0.0 255.255.255.0 U 0 0
> 0 brib0
>
>
>
> If I delete and then add the brib0 route, the route order is
> changed:
>
>
>
> 10.1.2.0 0.0.0.0 255.255.255.0 U 0 0
> 0 brib0
>
> 10.1.2.0 0.0.0.0 255.255.255.0 U 0 0
> 0 ib0
>
>
>
> Using this the VM can talk to the host (and visa versa), but hot
> across the network. Connectivity between ib0/brib0 over the network
> is also broken.
>
>
>
> I've also noticed that the same MAC is added to
> /etc/ovs-vswitchd.conf multiple times for brib0:
>
>
>
>
> bridge.brib0.mac=80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
>
>
> bridge.brib0.mac=80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
>
>
> bridge.brib0.mac=80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
>
>
>
> I've tried removing some of these but that does not seem to have any
> effect. My experience with IP routing and especially vswitch is
> limited and I'm not sure what to try from here. I've tried various
> configurations but no luck so far.
>
>
>
> Note that I'm testing with 2 XCP nodes configured in a pool. I've
> also checked that the PIFs are in the same order on both nodes (the
> reference mentions this). The MTU (1500) of brib0 differs from that
> of ib0 (2044), but changing this does not solve the problem.
>
>
>
> Any help is much appreciated. Thanks!
>
>
>
> Trygve
>
> --
> HypoBytes Ltd.
> Trygve Sanne Hardersen
> Akersveien 24F
> 0177 Oslo
> Norway
>
> [9]hypobytes.com
> +47 40 55 30 25
>
> --
> HypoBytes Ltd.
> Trygve Sanne Hardersen
> Akersveien 24F
> 0177 Oslo
> Norway
>
> [10]hypobytes.com
> +47 40 55 30 25
>
> --
> HypoBytes Ltd.
> Trygve Sanne Hardersen
> Akersveien 24F
> 0177 Oslo
> Norway
>
> [11]hypobytes.com
> +47 40 55 30 25
>
> --
> HypoBytes Ltd.
> Trygve Sanne Hardersen
> Akersveien 24F
> 0177 Oslo
> Norway
>
> [12]hypobytes.com
> +47 40 55 30 25
>
> References
>
> Visible links
> 1. mailto:trygve@xxxxxxxxxxxxx
> 2. http://10.1.2.2/24
> 3. http://10.1.2.102/24
> 4. http://10.1.2.202/24
> 5. mailto:trygve@xxxxxxxxxxxxx
> 6. mailto:Dave.Scott@xxxxxxxxxxxxx
> 7. mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> 8. mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> 9. http://hypobytes.com/
> 10. http://hypobytes.com/
> 11. http://hypobytes.com/
> 12. http://hypobytes.com/
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|