WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] bonding combined with network-bridge fails heartbeat clu

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-users] bonding combined with network-bridge fails heartbeat cluster on dom0
From: Darren Thompson <darrent@xxxxxxxxxxxxx>
Date: Mon, 12 Nov 2007 14:59:46 +1030
Delivery-date: Mon, 19 Nov 2007 09:59:54 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Help

I have a problem that I cannot resolve and am requesting your help with getting in-touch with the appropriate people to assist.

I have successfully used the configuration instructions and customised XEN bonding script sourced from: http://vandelande.com/guides/howto%20setup%20XEN%20using%20network%20bonding%20on%20SLES10.html (original and modified XEN script attached) and it works well with single servers.


Problem Summary:

I can get any two of the three combinations of XEN, Heartbeat and Bonding working but when all three together are used, the Hearbeat fails to communicate, killing the servers through "split brain condition" -  STONITH.
I initially encountered this problem on HP Blade servers but have since succeeded in recreating the same issue using VMWARE VM' (VM configs attached) so I am fairly confident that it is not a hardware related issue.
This configuration (without hearbeat clustering) works on single servers without issue.
If I do not team the NIC' then both XEN and Heartbeat appear to work as expected so the problem is a combination of all three.


Detailed description:

Two servers running SLES10SP1.
Each has two network cards (Physical restraint on HP blades - hence the desire to use bonding, to increase availability).
The network cards are bonded to create a virtual bond0 interface. (NIC & teaming config files attached)
The two servers are configured to run Heartbeat (configs files attached)
When booted to non XEN kernel both the NIC bonding and Heartbeat work without issue.
When booted to the XEN kernel the heartbeat fails to communicate (protocol: broadcast, multicast or unicast makes no difference) but the servers can communities successfully in all other regards.
I found this message thread on XENSOURCE: http://lists.xensource.com/archives/html/xen-users/2006-12/msg00650.html  ,although the 'work-around' does not appear to work in my case and is otherwise not suitable, although it  does indicate that this issue has been identified previously and not resolved.

Since I now have this problem configuration running under VMWARE I can provide a wealth of scripts, error logs etc (I have attached the VMWARE congif files for the VM servers to facilitate someone recreating my exact configuration)

Nasty Work Around:

I have found that taking any one of the two network cards from the TEAM and configuring them to connect to the same network will facilitate the HB working, although it completely defeats the whole purpose of using the teaming, it does indicate the issue is with the way XEN modifies the bonding driver at startup.

I can also get the servers to work if I do not attempt any bonding but configure the NIC' separately, eth0 for XEN and eth1 for Heartbeat.

Feel free to include my contact details to anyone who you think can assist

Regards



Darren Thompson
Professional Services Engineer

AkurIT
Level 24, Santos House
91 King William Street
Adelaide SA 5000
Australia

Tel: +61 8 8233 5873
Fax:  +61 8 8233 5911
Mobile: +61 0400 640 414
Mail: darrent@xxxxxxxxxxxxx

Attachment: ifcfg-bond0
Description: Text document

Attachment: ifcfg-eth-id-00:0c:29:e5:82:8a
Description: Text document

Attachment: ifcfg-eth-id-00:0c:29:e5:82:80
Description: Text document

Attachment: network-bridge
Description: application/shellscript

Attachment: network-bridge-bonded
Description: application/shellscript

Attachment: network-bridge-nobond
Description: application/shellscript

Attachment: XEN-HB2-N1.vmx
Description: application/vmware-vm

Attachment: XEN-HB2-N2.vmx
Description: application/vmware-vm

Attachment: ha.cf
Description: Text document

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Xen-users] bonding combined with network-bridge fails heartbeat cluster on dom0, Darren Thompson <=