[Xen-users] Network Block Devices & Redundancy

To:	Xen User-List <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject:	[Xen-users] Network Block Devices & Redundancy
From:	Matthew Richardson <M.Richardson@xxxxxxxx>
Date:	Thu, 06 Aug 2009 11:26:42 +0100
Delivery-date:	Thu, 06 Aug 2009 03:27:30 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-users-request@lists.xensource.com?subject=help>
List-id:	Xen user discussion <xen-users.lists.xensource.com>
List-post:	<mailto:xen-users@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Thunderbird 2.0.0.22 (X11/20090608)

I'm currently looking to establish a small pool of xen hosts, each
running several guests, backed by disk servers for the guests,
accessible across the network.

I'm hoping to achieve ease of migration, and reliability of service
(particularly in cases of hardware failure), but I'm not pushing for
24/7 guaranteed HA.

I've looked at the different ways of offering network storage to the
guests, and have narrowed it down to what I think are the 2 best options:

1) DRBD & Heartbeat (or similar cluster management tool): Run 2 disk
servers with one 'master', mirroring in the background.  In case of
failure of the master, switch the slave to being the master.

Advantages:

Server-server communication is efficient
Recovery only has to update a small delta


Disadvantages:

Risk of split-brain, if the disk servers stop 'seeing' each other and
both try to become masters.



2) Software RAID(1) of iSCSI disks: Export an iSCSI disk from each of 2
disk servers - RAID these together on the host, using multipath and
RAID1 to deal with disk failure.

Advantages:

RAID array degrades 'nicely' - no need to switch master/slave roles etc.
No risk of split brain.
No human decision making involved during recovery stages

Disadvantages:

RAID rebuild happens on the host, not between servers
RAID recovery requires complete rebuild, not just delta of changes


Looking at the above, I am drawn towards the RAID1 option.  While it
might be less efficient in terms of speed and rebuild times, it
completely avoids the risk of split brain, and also there seem to be far
fewer places where manual intervention (i.e human decisions where a
script can't do the work) might be needed than with DRBD.

However, I'm also aware I might be missing something fundamental here.

Anyone care to agree with, or enlighten me?

Thanks,

Matthew

-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

WARNING - OLD ARCHIVES

xen-users

[Xen-users] Network Block Devices & Redundancy