On 2008-07-31 21:30, Antibozo wrote:
I've reviewed the list archives, particularly the posts from Zakk, on
this subject, and found results similar to his. drbd provides a
block-drbd script, but with full virtualization, at least on RHEL 5,
this does not work; by the time the block script is run, the qemu-dm has
already been started.
I've developed a workaround for all of this, in the form of a wrapper
script for qemu-dm. This is trickier than it might seem at first blush,
because of the way that xend uses signals to communicate with qemu-dm.
The wrapper script can be used in the "model =" line of a vm definition,
and will take care of assuring consistency of the drbd resource(s) for a
vm during reboots, migration, etc.
The script can be found here:
http://www.antibozo.net/xen/qemu-dm.drbd
Strategy is detailed in script comments. Please review these if you want
details. The principle objective is prevention of split brain.
If you want to use Xen on top of drbd for high availability, this is a
decent first cut, as far as I can tell. Feedback is welcome.
Instead I've been simply musing the possibility of keeping the drbd
devices in primary/primary state at all times. I'm concerned about a
race condition, however, and want to ask if others have examined this
alternative.
I've moved away from this strategy, and am keeping resources secondary
when a vm isn't using them. This enables the remote node to tell if a vm
is already running on a drbd resource by inspecting the peer
primary/secondary status (the wrapper script does this). This makes it
difficult, though not impossible, for you to accidentally fire up a vm
using a resource that is already in use by a vm on the remote node.
I've also discovered that primary/primary mode is not actually needed,
at least for HVM vms using Xen 3.0.3 as shipped on RHEL 5. The
conventional wisdom was that primary/primary was necessary during
migration, but with the appropriate wrapper around qemu-dm, we can wait
for the peer to go secondary before going primary on the local node.
One way you can still get yourself pretty hosed (if you're determined to
do so) is the following:
- Start vm on node A. The wrapper makes the drbd resource primary, and
the vm starts running.
- Start vm on node B. This creates the vm instance, but the wrapper
blocks waiting for the drbd resource on node A to be secondary.
- Start a migration from node A to B. This freaks xend out since it
already has a vm with the same name running (even though it isn't
actually running yet).
In this scenario, you may end up having to reboot node B because the xen
store gets crufty. But you still should never end up with a split brain
condition.
Obviously you could also get hosed if your nodes can't talk to one
another, and you start the same vm on both nodes. This is classic split
brain. In this case, drbd should refuse to resync when drbd connectivity
is restored, and you'll have to kill one of the vm instances, invalidate
the local drbd resource, and resync, after which things should be fine.
I haven't tested this scenario yet, so YMMV.
I am thinking of a scenario where the vm is running on node A, and has a
process that is writing to disk at full speed, and consequently the drbd
device on the node B is lagging. If I perform a live migration from node
A to B under this condition, the local device on node B might not be in
sync at the time the vm is started on that node. Maybe.
I have done some testing of heavy disk i/o situations during live
migration, and things appear to remain fully consistent. Note that the
i/o stack of filesystem on top of LVM volume, on top of xen, on top of
drbd, on top of LVM volume is not super fast. I see 10-20 MB/s with new
block allocation on a 4-core PowerEdge 1950 using SAS disks (with one
CPU allocated to the vm). So don't plan on that particular architecture
for your heavily used RDBMS.
If I use drbd protocol C, theoretically at least, a sync on the device
on node A shouldn't return until node B is fully in sync. So I guess my
main question is: during migration, does xend force a device sync on
node A before the vm is started on node B?
By all appearances (empirically), yes. And since this qemu-dm wrapper
also waits for secondary state on the peer, and UpToDate state on the
local copy, before actually invoking the real qemu-dm, I believe we are
covered.
--
Jefferson Ogata : Internetworker, Antibozo
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|