This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-users] Solving the DRBD resync issue (was: Alternatives to a du

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-users] Solving the DRBD resync issue (was: Alternatives to a dual-primary DRBD setup)
From: "Fajar A. Nugraha" <list@xxxxxxxxx>
Date: Mon, 23 May 2011 15:28:40 +0700
Delivery-date: Mon, 23 May 2011 01:29:30 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4DDA10EE.20501@xxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <4DD6B842.1040701@xxxxxxxx> <AEC6C66638C05B468B556EA548C1A77D01D57281@trantor> <4DD7AB2D.8070001@xxxxxxxx> <4DDA10EE.20501@xxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
On Mon, May 23, 2011 at 2:46 PM, Daniel Brockmann <meszi@xxxxxxxxxxx> wrote:
> Hello once again,
> the more I think about my DRBD issue and the more I research in the net the
> more I tend to explain the issue with limited CPU time for dom0.

First thing first.
By "sync problems" in your previous post, did you mean both nodes
experience split brain for the drbd resource?

When setup properly, you should NOT experience it, regardless of how
much CPU resource dom0 has. You should only experience SLOW disk I/O.
split brain usually occur if you don't setup fencing properly.

> It will be
> better resolving _this_ instead of possibly reaching the same stage later on
> again but using another replication technique, wouldn't it?
> Reasons why I think it is an I/O and/or CPU time issue:
> 1. It worked properly when I still did not have 8 virtual guest systems
> installed.
> 2. As soon as I start a DRBD resync my virtual guests bring kernel error
> messages like "INFO: task exim4:2336 blocked for more than 120 seconds. ".
> 3. When starting both Xenserver machines and syncing before starting the
> virtual guests a startup that's usually done in <5 minutes takes up to 60
> minutes.

... which is exactly the SLOW I/O I mentioned above.

> I checked the XenWiki accordingly and found two promising entries that I'd
> like to follow, if it's possible to apply them under a Citrix Xenserver 5.6
> system:
> http://wiki.xensource.com/xenwiki/XenCommonProblems#head-413e1d74442772fd5a0a94f0655a009744096627
> 1. How can I limit the number of vcpus my dom0 has?
> 2. Can I dedicate a cpu core (or cores) only for dom0?
> Especially the 2nd one appears to meet what I expect. So I would be going to
> check if I can configure that. How do _you_ think about it?

This thread might be able to help you:

Personally, I suggest you step back and evaluate several things:
- do you REALLY need active-active setup?
Active-active drbd mandates protocol C (sync replication), which can
GREATLY slow down your throughput. If you can afford a small downtime
better stick with async replication.

- do you KNOW how much IOPS you need?
Disk IOPS is especially important since it's usually the bottleneck in
virtualized environment. For example, a big time virtualization
provider that I know of use 30 IOPS per VM for sizing purposes (the
assumption is that not all VMs will be IO-intensive, so they use a low
number like 30 for simplifaction purposes). Then they multiply it by
the number of VM, and use sizing tool from storage-appliance-vendor to
calculate the number and type of disk required. Of course if you know
that your VM will be IO-intensive (e.g busy mail server), the
asumption above will not be valid for you, and you need to adjust it
to something higher.

- do you HAVE the necessary resource to support IOPS and replication
For example, let's say you use 30 IOPS per vm number above, and you
have 20 VM per host. So you need 30*20 = 600 IOPS. Lets assume one
7200 rpm disk can support 100 IOPS, so you need a MINIMUM of 6 disk
(if you use raid0) or 12 disk (if you use raid10). Then assume
active-active DRBD will make performance drop by 75%, so you'll need
12 * 4 = 48 disk in raid10. Do you have that?

All things considered, it might be that your best option would be
something like:
- get a separate server with lots of disks, setup raid10, install a
storage appliance OS on top (e.g.
http://www.napp-it.org/index_en.html) then export it to your xenserver
either as nfs or scsi. While nfs/scsci induce some overhead, it should
be lower compared to using drbd, OR
- drop active-active requirement, OR
- beef-up your xenserver (e.g. use fast storage like SSD), upgrade
XenServer/XCP version to get dom0 to use multiple CPU core on dom0,
upgrade DRBD version to the latest, and setup proper fencing.


Xen-users mailing list