[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: AW: [Xen-API] SG_IO for iscsi targets in XCP



Well..

I'm not very well understand the reasons. You talking about online or offline split brain? As I say early, online split brain could be prevented by using same network adapter (if link lost - there is no replication, no new writing operations, no 'insonsistent' reading operations).

Offline split brain could be prevented by manual startup (host boots without active DRBD and iscsi service). If only one server has been rebooted, than clients are served by second server. If both of them go down, you need to find most recent node (manually, with help from DRBD sync process) and bring them up after resync (you already got down, so little more time will not make a drastic changes).

The main reason I wants primary/primary DRBD is doubled amount of reading devices - this will really reduce load. I expect some very significant difference... And one more little part: in primary/primary mode some XCP host go to one target, other to second. If one of the node will fail only half of customers will get a pretty long lag before switching.

On 19.07.2011 19:10, Uli StÃrk wrote:
An SAN-replication is not good enough, because of the giant raidsets. There is so 
much (random) workload on the disks, that a re-sync wont exceed 100 MB/s. We usually 
have about 50 MB/s if we donât want to affect the running applications. Our 
raidsets would take more than a week to synchronize/verify :( We must have the 
possibility to replicate smaller sets of data. So we use DRBD for replicating data 
like you suggested for SANs.

Due to our experience, there are service several  service interruptions on redundant 
wan connections. You cant avoid this! Usually the service interruptions are very 
short (less than 5 minutes). Each interruption would trigger a failover process for 
a master-master setup and go into a split brain mode. In this case you will lose 
data, if you discard the changes on one node. Losing data is usually the worst thing 
that can happen. A merge is usually not cost-effective possible (database-duplicate 
key entries, etc). A short service-interruption is within the SLA and we donât 
lose data. If we can predict that a service interruption will take more than a few 
minutes, we fail over to the second site. Usually this happens if the datacenter 
burns to the ground or a redundant server or networking component fails. This 
usually this happens less than once a year ;)

IMHO a master-master setup can only be recommended if you have no real 
networking between the nodes and use it for higher performance as a single node 
can offer. In all other cases, use it for backup and a backup should be a 
master-slave setup.


-----UrsprÃngliche Nachricht-----
Von: George Shuklin [mailto:george.shuklin@xxxxxxxxx]
Gesendet: Dienstag, 19. Juli 2011 16:09
An: Uli StÃrk
Betreff: Re: AW: [Xen-API] SG_IO for iscsi targets in XCP

There is two types of split-brain: online and offline.

Offline split-brain:

two primary/primary (p/p) are online
first go down, second primary operates some time second go down firts go up 
[stage1] second go up and found that one conflicts with first. [stage2]

This situation is somehow bad. In stage2 we will need to dischange every data 
second and problem actually starts at stage1, when we 'go to the past' by 
bringing up older machine.

In this situation we can:  go down again and replicate all data from second to 
first (we loosing 'time fork' we created during second StandAlone operation).
OR
simply replicate second from first and continue to operate in 'past fork', 
polling back state to moment 'first go down' and forgetting all second efforts.

All those problems can be solved by manual disaster recovery. If one of the 
servers go down, when it came back it must be stated manually. In normal 
datacenter downtime usually assisted by staff.

The second case is 'online' split-brain.

DRBD do require link between 'heads'. If this link go down, both heads have 
starting to think that remote node is down and continue operates independently. 
(If we say 'go down if remote disconnected', that means we kill any Fault 
Tolerance in DRBD - no reason to do p/p DRBD at all).
In this case we will met a horrible completely data loss - some data going to 
one, someone to second, and if we using load balancing, we can shutdown storage 
and says 'oops, sorry guys, no more data'.

Even a dedicated cord between DRBD hosts does not save from constant fear of 
online split brain.
If some asshole plug it out?
... or simply pull by moving equipment (drop something heavy?) If network card 
or cord die?
If someone say 'ethX down' by mistake on one of the servers?

All those cases is not a 'sorry, we have 36hr downtime', it all 'sh.t, 
everything is lost'.

And there is simple and elegant solution to all fears: use SAN for replication 
(same interface for replication and iscsi serving).

If you have enough bandwidth (10G usually do), this solve everything:

If some link, cord, network card and so on goes down, this host stops to serve 
clients. No IO, no new data, no problems with data corruption.


So I think dual head is possible in case of XCP. Specific architecture allow 
this. (I hope, I'll test and report later).

Ð ÐÑÑ, 19/07/2011 Ð 12:50 +0000, Uli StÃrk ÐÐÑÐÑ:
My 5 cents: In real-world applications a split-brain will cause so
much work/trouble (and even service-interruption) that most admins
here will not consider using a dual-primary configuration ;)

-----UrsprÃngliche Nachricht-----
Von: xen-api-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-api-bounces@xxxxxxxxxxxxxxxxxxx] Im Auftrag von George
Shuklin
Gesendet: Dienstag, 19. Juli 2011 14:34
An: Dave Scott
Cc: xen-api@xxxxxxxxxxxxxxxxxxx
Betreff: RE: [Xen-API] SG_IO for iscsi targets in XCP

Thank you very much.

I feel more safe now with dual primary DRBD configuration. I'll report results 
of practical deployment with real-life load later.

Ð ÐÑÑ, 19/07/2011 Ð 12:21 +0100, Dave Scott ÐÐÑÐÑ:
Hi George,

XCP just uses shared LVM over iSCSI as a generic block device. This is only safe because 
(i) we modified LVM to run in a "read-only" mode on slaves; and (ii) we 
co-ordinate all LVM metadata updates across the pool in the XCP storage layer.

I'm researching if XCP by anyway is issuing some SCSI commands
like reservation or persistent reservation. I done 'greping' via
source code for SG_IO ioctl() and found just few innocent inquiry/id requests.

Just to be sure: Is any SCSI-specific features used in XCP for
cluster management or resource locking? Or iscsi used only as
generic block device with LVM?


_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api


_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.