xen-api
Re: AW: AW: [Xen-API] SG_IO for iscsi targets in XCP
Well..
I'm not very well understand the reasons. You talking about online or
offline split brain? As I say early, online split brain could be
prevented by using same network adapter (if link lost - there is no
replication, no new writing operations, no 'insonsistent' reading
operations).
Offline split brain could be prevented by manual startup (host boots
without active DRBD and iscsi service). If only one server has been
rebooted, than clients are served by second server. If both of them go
down, you need to find most recent node (manually, with help from DRBD
sync process) and bring them up after resync (you already got down, so
little more time will not make a drastic changes).
The main reason I wants primary/primary DRBD is doubled amount of
reading devices - this will really reduce load. I expect some very
significant difference... And one more little part: in primary/primary
mode some XCP host go to one target, other to second. If one of the node
will fail only half of customers will get a pretty long lag before
switching.
On 19.07.2011 19:10, Uli Stärk wrote:
An SAN-replication is not good enough, because of the giant raidsets. There is so much
(random) workload on the disks, that a re-sync wont exceed 100 MB/s. We usually have
about 50 MB/s if we don’t want to affect the running applications. Our raidsets
would take more than a week to synchronize/verify :( We must have the possibility to
replicate smaller sets of data. So we use DRBD for replicating data like you suggested
for SANs.
Due to our experience, there are service several service interruptions on redundant
wan connections. You cant avoid this! Usually the service interruptions are very short
(less than 5 minutes). Each interruption would trigger a failover process for a
master-master setup and go into a split brain mode. In this case you will lose data,
if you discard the changes on one node. Losing data is usually the worst thing that
can happen. A merge is usually not cost-effective possible (database-duplicate key
entries, etc). A short service-interruption is within the SLA and we don’t lose
data. If we can predict that a service interruption will take more than a few minutes,
we fail over to the second site. Usually this happens if the datacenter burns to the
ground or a redundant server or networking component fails. This usually this happens
less than once a year ;)
IMHO a master-master setup can only be recommended if you have no real
networking between the nodes and use it for higher performance as a single node
can offer. In all other cases, use it for backup and a backup should be a
master-slave setup.
-----Ursprüngliche Nachricht-----
Von: George Shuklin [mailto:george.shuklin@xxxxxxxxx]
Gesendet: Dienstag, 19. Juli 2011 16:09
An: Uli Stärk
Betreff: Re: AW: [Xen-API] SG_IO for iscsi targets in XCP
There is two types of split-brain: online and offline.
Offline split-brain:
two primary/primary (p/p) are online
first go down, second primary operates some time second go down firts go up
[stage1] second go up and found that one conflicts with first. [stage2]
This situation is somehow bad. In stage2 we will need to dischange every data
second and problem actually starts at stage1, when we 'go to the past' by
bringing up older machine.
In this situation we can: go down again and replicate all data from second to
first (we loosing 'time fork' we created during second StandAlone operation).
OR
simply replicate second from first and continue to operate in 'past fork',
polling back state to moment 'first go down' and forgetting all second efforts.
All those problems can be solved by manual disaster recovery. If one of the
servers go down, when it came back it must be stated manually. In normal
datacenter downtime usually assisted by staff.
The second case is 'online' split-brain.
DRBD do require link between 'heads'. If this link go down, both heads have
starting to think that remote node is down and continue operates independently.
(If we say 'go down if remote disconnected', that means we kill any Fault
Tolerance in DRBD - no reason to do p/p DRBD at all).
In this case we will met a horrible completely data loss - some data going to
one, someone to second, and if we using load balancing, we can shutdown storage
and says 'oops, sorry guys, no more data'.
Even a dedicated cord between DRBD hosts does not save from constant fear of
online split brain.
If some asshole plug it out?
... or simply pull by moving equipment (drop something heavy?) If network card
or cord die?
If someone say 'ethX down' by mistake on one of the servers?
All those cases is not a 'sorry, we have 36hr downtime', it all 'sh.t,
everything is lost'.
And there is simple and elegant solution to all fears: use SAN for replication
(same interface for replication and iscsi serving).
If you have enough bandwidth (10G usually do), this solve everything:
If some link, cord, network card and so on goes down, this host stops to serve
clients. No IO, no new data, no problems with data corruption.
So I think dual head is possible in case of XCP. Specific architecture allow
this. (I hope, I'll test and report later).
В Втр, 19/07/2011 в 12:50 +0000, Uli Stärk пишет:
My 5 cents: In real-world applications a split-brain will cause so
much work/trouble (and even service-interruption) that most admins
here will not consider using a dual-primary configuration ;)
-----Ursprüngliche Nachricht-----
Von: xen-api-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-api-bounces@xxxxxxxxxxxxxxxxxxx] Im Auftrag von George
Shuklin
Gesendet: Dienstag, 19. Juli 2011 14:34
An: Dave Scott
Cc: xen-api@xxxxxxxxxxxxxxxxxxx
Betreff: RE: [Xen-API] SG_IO for iscsi targets in XCP
Thank you very much.
I feel more safe now with dual primary DRBD configuration. I'll report results
of practical deployment with real-life load later.
В Втр, 19/07/2011 в 12:21 +0100, Dave Scott пишет:
Hi George,
XCP just uses shared LVM over iSCSI as a generic block device. This is only safe because
(i) we modified LVM to run in a "read-only" mode on slaves; and (ii) we
co-ordinate all LVM metadata updates across the pool in the XCP storage layer.
I'm researching if XCP by anyway is issuing some SCSI commands
like reservation or persistent reservation. I done 'greping' via
source code for SG_IO ioctl() and found just few innocent inquiry/id requests.
Just to be sure: Is any SCSI-specific features used in XCP for
cluster management or resource locking? Or iscsi used only as
generic block device with LVM?
_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api
_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api
|
|
|