Dirk,
I could sure use some help with this, as I've been struggling to get it to work
properly for a few days now.
I'm currently using the shipped network-bridge-bonding script, as it sets
everything up properly, although, to me it seems it is. The issue I'm going to
describe is exactly the same issue I encountered while trying different manual
setups, including the setup with ifcfgs in my first mail.
I switched from bonding mode 0 (balance-rr) to mode 2 (balance-xor) because
round-robin screws up tcp to much by packets arriving out of order.
To me it seems, the issue is basically the bridge that stops forwarding packets
to the DomU virtual inferfaces (vifX.0).
For example, VM1 runs on one server (Xen1). VM2 runs on another (Xen2). The xen
servers are in their own subnet, separate from the VMs.
VM1 tries to ping VM2. The ARP request arrives at VM2 and it sends its reply.
Is see the ARP reply arriving (mac src and dst are ok, I checked that) back at
the bridge of Xen1, but it's never forwarded to vif1.0, the virtual interface,
connected to the bridge for VM1.
I got the same problem the other way around. Not even just that. I got another
DomU, VM3, running on Xen1. VM1 appears to have to same issue when trying to
ping VM3.
When this problem occurs, it may fix itself after a few minutes, but I've seen
the problem persist for over half an hour. Removing the vif interface from the
bridge and re-adding it, spontaneously fixes the issue (for a while though).
The problem is not limited to ARP replies. When it happens, it happens to all
traffic. Ping request are not forwarded anymore and ssh sessions stall. So it
seems to a be a problem at the Ethernet layer, the software bridge to be
precise. Load on the Xen hosts is virtually none. Ifconfig reports some dropped
packets on the vif1.0 interface, but unless that counter has a delay of several
minutes, the packet drop is not the issue here and the drops seen there were
caused by traffic like from iperf. The Dom0s seem to be working fine in a
networking figure of speech.
Do you have ANY idea what the hell could be happening here?
Thanks in advance!
Eric
Some specs/configs:
Uname:
Linux vs1.loc.footsteps.nl 2.6.18-194.17.1.el5xen #1 SMP Wed Sep 29 14:12:56
EDT 2010 i686 i686 i386 GNU/Linux
Ifconfig:
bond0 Link encap:Ethernet HWaddr 00:1E:C9:BB:3B:DE
inet addr:192.168.1.11 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21e:c9ff:febb:3bde/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:353733 errors:0 dropped:0 overruns:0 frame:0
TX packets:198680 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:421809697 (402.2 MiB) TX bytes:78359788 (74.7 MiB)
eth0 Link encap:Ethernet HWaddr 00:1E:C9:BB:3B:DE
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:4379 errors:0 dropped:0 overruns:0 frame:0
TX packets:2588 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:481145 (469.8 KiB) TX bytes:406751 (397.2 KiB)
Interrupt:16 Memory:dfdf0000-dfe00000
eth1 Link encap:Ethernet HWaddr 00:1E:C9:BB:3B:DE
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:352606 errors:0 dropped:0 overruns:0 frame:0
TX packets:227793 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:428009344 (408.1 MiB) TX bytes:81288973 (77.5 MiB)
Interrupt:17 Memory:dfef0000-dff00000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:722 errors:0 dropped:0 overruns:0 frame:0
TX packets:722 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:152288 (148.7 KiB) TX bytes:152288 (148.7 KiB)
pbond0 Link encap:Ethernet HWaddr 00:1E:C9:BB:3B:DE
inet6 addr: fe80::21e:c9ff:febb:3bde/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:356985 errors:0 dropped:0 overruns:0 frame:0
TX packets:230381 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:428490489 (408.6 MiB) TX bytes:81695724 (77.9 MiB)
vif1.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:1754 errors:0 dropped:0 overruns:0 frame:0
TX packets:100596 errors:0 dropped:18098 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:254517 (248.5 KiB) TX bytes:20254553 (19.3 MiB)
vif2.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:1694 errors:0 dropped:0 overruns:0 frame:0
TX packets:89710 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:231524 (226.0 KiB) TX bytes:18317729 (17.4 MiB)
virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>From xend-config.sxp
(network-script 'network-bridge-bonding netdev=bond0')
-----Oorspronkelijk bericht-----
Van: Dirk [mailto:dirk.schulz@xxxxxxxxxxxxx]
Verzonden: dinsdag 19 oktober 2010 14:56
Aan: Eric van Blokland; xen-users
Onderwerp: Re: [Xen-users] High availability Xen with bonding
Eric,
Am 18.10.10 08:11, schrieb Eric van Blokland:
> Hey Florian,
>
> In my reply to Bart I've explained I'm having some connectivity issues. I'm
> not using VLANs either, nor have access to expensive switches with link
> aggregation support. I'm going to have a look at the manuals you mentioned.
> Perhaps there is some setting I missed which is causing my issues. I'll keep
> you all updated on my progress.
I am using bonding with xen bridges on CentOS and Debian without LACP or
VLANs. If you could describe your connection issues I can help you, maybe.
Dirk
> Regards,
>
> Eric
>
> -----Oorspronkelijk bericht-----
> Van: Florian Heigl [mailto:florian.heigl@xxxxxxxxx]
> Verzonden: zondag 17 oktober 2010 7:19
> Aan: Bart Coninckx
> CC: xen-users@xxxxxxxxxxxxxxxxxxx; Eric van Blokland
> Onderwerp: Re: [Xen-users] High availability Xen with bonding
>
> Hi both,
>
> I had very good success after some pulling-hairs.
> I run lacp + vlan trunking.
> Key assumptions:
> - All device setup (eth, bond, bridges) is done via normal OS config,
> because that is more reliable.
> - All libvirt stuff is disabled, it just limits Xen's possibilities to
> "home user level" by assuming you'd only have one bridge.
> (chkconfig XXX off ...)
> - No messing with ARP is wanted
> - You have switches current enough to do "real" LACP
>
>
> There's a very good (and I think the only working one) manual in the
> Oracle VM wiki at
> http://wiki.oracle.com/page/Oracle+VM+Server+Configuration-+bonded+and+trunked+network+interfaces
>
> I myself had followed one manual from redhat, which left off somewhere
> in the middle.
> It's called "Xen_Networking.pdf" by Mark Nielsen. It's a good intro,
> but only covers 50% of a good setup.
>
> Notes:
> a) if you look not just at link aggregation but a VLAN-heavy
> environment there might be a point (>128 VLANs) where the number of
> virtual bridges might become an issue. Then wait for OpenVswitch to
> mature or email xen-devel and ask for the status of "vnetd". (just
> kidding)
> b) using ethernet (n ethernet links into bond0) and infiniband (2
> infiniband hca ports into bond1) bonding on the same host is more
> tricky. it seems the ethernet bonding driver tries to cover infiniband
> too. The setup is completely undocumented. It is possible, but when I
> tried it just didn't pass any more traffic.
> For the setup check the following thread in HP itrc:
> http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1287292335750+28353475&threadId=1445752
> c) added speed is only guaranteed for multiple connections. if you do
> it via bonding, you need a lacp algorithm in your switch that will
> hash based on the ip destination ports, not just mac address or ip
> address. current cisco gear can do that. for plain iscsi your path
> grouping would decide if you see loadbalancing with multiple iSCSI
> lans.
>
>
> Hope you get it to work!
>
>
>
> Florian
>
> 2010/10/16 Bart Coninckx<bart.coninckx@xxxxxxxxxx>:
>> On Friday 15 October 2010 13:44:42 Eric van Blokland wrote:
>>> Hello everyone,
>>>
>>> A few days back I decided to give Ethernet port bonding in Xen another try.
>>> I've never been able to get it to work properly and after a short search I
>>> found the network-bridge-bonding script shipped with CentOS-5 probably
>>> wasn't going to solve my problems. Instead of searching for a tailored
> [...]
>> curious to see your progress in this. Up till now I tackled network
>> redundancy
>> with multipathing, not with bonding. However, this does not provide added
>> speed, though theoretically it should. So I recently decided to switch to
>> bonding for the hypervisors in their connections to iSCSI, using rr and
>> running over seperate switches, just like you but I'm not at the point of
>> installing domU's, so I can't really comment on how and if it works. Will
>> know
>> next week though so I will return to this post with my findings...
> that should definitely get you increased speed, plus multiple iSCSI
> connections via separate subnets / nics is the only way you can get
> close to FC reliabilty for lower budget.
>
>
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|