Published on September 25, 2014
Building Synchronous MySQL clusters in Cloud and WAN Alexey Yurchenko Codership Oy
A Very Dirrrty Word Sssssssssss... www.codership.com 3
A Very Dirrrty Word Synchronous. www.codership.com 4 View slide
A Very Dirrrty Word Synchronous. w h a t i s i t g o o d f o r ? ? ? www.codership.com 5 View slide
Data Safety Asynchronous Replication: Client Master Slave www.codership.com 6 COMMIT Replicate OK COMMIT Potential data loss
Data Safety Synchronous Replication: Client Master Slave COMMIT Replicate ACK www.codership.com 7 OK COMMIT Additional latency
Data Safety Disaster Recovery: Replication DC1 DC2 #1 www.codership.com 8
Multi-Master Client1 Master1 Master2 Client2 COMMIT Replicate www.codership.com 9 OK COMMIT DEADLOCK CONFLICT DETECTION CONFLICT RESOLUTION COMMIT CONFLICT DETECTION CONFLICT RESOLUTION ROLLBACK
Access Latency Elimination www.codership.com 10
Access Latency Elimination #2 www.codership.com 11
Benchmark Setup (Amazon EC2) www.codership.com 12 us-east ~ 6000 km, ~ 90 ms RTT eu-west us-east eu-west
Access Latency Elimination client location us-east server US-EU cluster change us-east 28.03 ms 119.80 ms ~4.3x eu-west 1953.89 ms 122.92 ms ~0.06x www.codership.com 13
What Happened? ~ 6000 km, ~ 90 ms RTT SQL traffic (reads, writes, etc.) www.codership.com 14 SQL traffic Replication traffic (commits only)
To Sync or Semi-sync?
Look, Ma! No 2-phase commit! Client Master Slave COMMIT Replicate ACK www.codership.com 16 OK COMMIT Slave didn't commit!
To Sync or Semi-sync? Replicate Master Slave Synchronous (master rolls back and stops): ● Data redundancy preserved (sort of: slave is dead) ● Availability compromised (!!!) Semi-synchronous (master continues): ● Data redundancy compromised ● Availability preserved www.codership.com 17
To Sync or Semi-sync? For all practical purposes (production) replication is supposed to protect against master loss, not slave loss (slave loss is mitigated by adding more slaves), to increase the availability of the service. Ironically, fully synchronous replication is not only impractically slow, it is detrimental to the availability goal. www.codership.com 18
Synchronous Replication in WAN The Latency And How To Deal With It.
The Latency And How to Deal With It Latency: 1 RTT – 1.5 RTT (100 – 500 ms) (<200 ms should be practically possible) Trx rate <= 1/Latency (10 – 2 transactions per second? Blast! ) www.codership.com 20
The Latency And How to Deal With It The way they deal with any latency: 1) Buffering: AUTOCOMMIT UPDATEs → multi-statement transactions 2) Parallelization: 1 client session → 10 client sessions www.codership.com 21
Synchronous Replication in WAN Galera Cluster for MySQL variants
Galera Cluster for MySQL variants mysqld MySQL wsrep API Galera wsrep patch Synchronous communication Cluster (other nodes) www.codership.com 24 Dynamic library wsrep API
Galera Cluster for MySQL variants www.codership.com 25
Galera Cluster for MySQL variants MySQL-wsrep MariaDB Galera Cluster www.codership.com 26 Percona XtraDB Cluster Galera Galera Galera
Galera Cluster and CAP Theorem Consistency www.codership.com 27 Availability Partition Tolerance Fixed: timeouts
Synchronous Replication in WAN Goals: ● Disaster Recovery ● Performance ● Service Availability DO's and DONT's
Synchronous Replication in WAN: DO's Invest in a good WAN link (You invest in nodes. The link is the same part of the cluster as the nodes are.) www.codership.com 29
Synchronous Replication in WAN: DO's Categorize your data: 1) Rare, small writes, frequent reads, global data – good. 2) Heavy writes, few reads, local data – bad. www.codership.com 30
Synchronous Replication in WAN: DO's Categorize your data (OpenStack): 1) KeyStone identity data, Glance image metadata: mostly reads, small writes, data of global interest. 2) Ceilometer monitoring data: almost write-only, no need to share globally – store in MongoDB. Jay Pipes, “Tales from the Field: Backend Data Storage in OpenStack Clouds” www.codership.com 31
Synchronous Replication in WAN: DO's Configure timeouts: ● All Galera timeouts and periods should be no less than WAN round trip times. ● Defaults should be suitable for networks with up to 500ms RTTs. ● The higher the timeouts – the more partition tolerant and the less available the cluster is (CAP theorem). ● Timeouts relation: RTT <= evs.suspect_timeout <= evs.inactive_timeout <= evs.install_timeout ● evs.suspect_timeout is the timeout to detect single node partition/failure ● Further info: http://galeracluster.com/documentation-webpages/configurationtips.html#wan-replication www.codership.com 32
Synchronous Replication in WAN: DO's Configure cluster segments: 2 www.codership.com 33 1 DC1 1 1 DC2 2 DC3 3 3 3 2
Synchronous Replication in WAN: DO's Choose odd number of nodes and odd number of datacenters: ● Most popular choice: 3x3 ● Also observed in the field: 5x3 and 3x5 www.codership.com 34
Synchronous Replication in WAN: DO's 3 is better than 2! DC1 DC2 DC3 www.codership.com 35
Synchronous Replication in WAN: DONT's 1) Hot Spots www.codership.com 36
Synchronous Replication in WAN: DONT's hotspot 1 RTT www.codership.com 37
Synchronous Replication in WAN: DONT's 1) Hot Spots 2) Poor Links www.codership.com 38
Synchronous Replication in WAN: DONT's Synchronous – with who? Full packet loss → the node is not with us www.codership.com 39 No packet loss → the node is with us ???
Synchronous Replication in WAN: DONT's 1) Hot Spots 2) Poor Links 3) Huge Transactions www.codership.com 40
Synchronous Replication in WAN: DONT's Huge transactions kill concurrency: a) Long to replicate b) Long to certify c) Long to apply on slave → SLAVE LAG www.codership.com 41
Synchronous Replication in WAN: DONT's 1) Hot Spots 2) Poor Links 3) Huge Transactions 4) No Primary Keys www.codership.com 42
Synchronous Replication in WAN: DONT's No PRIMARY KEY: mysql> DELETE FROM 10M_rows_no_PK_table; => 50 000 000 000 000 rows scan. www.codership.com 43
If Synchronous Doesn't Work Out Native MySQL Asynchronous Replication (log_slave_updates = ON) A Between Galera Clusters www.codership.com 44 1 Galera1 3 2 Galera2 C B async Master Slave
If Synchronous Doesn't Work Out Native MySQL Asynchronous Replication A Between Galera Clusters www.codership.com 45 1 Galera1 3 2 Galera2 C B async Master Slave
If Synchronous Doesn't Work Out Native MySQL Asynchronous Replication A Between Galera Clusters www.codership.com 46 1 Galera1 3 Galera2 C B async Master Slave
If Synchronous Doesn't Work Out Native MySQL Asynchronous Replication Between MariaDB Galera Clusters (log_slave_updates = OFF) A www.codership.com 47 1 Galera1 3 2 Galera2 C B async Master Slave
Synchronous Replication in WAN Q & A www.codership.com 48
Speaker: Matt Stine Developing for the Cloud Track Marc Andressen has famou...
This presentation explains how to develop a Web API in Java using (JAX-RS or Restl...
How to bring innovation to your organization by streamlining the deployment proces...
Cisco Call-control solutions can handle voice, video and data
Nathan Sharp of Siemens Energy recently spoke at the SAP Project Management in Atl...
Synchronous multi-master clusters with MySQL: an introduction to Galera Henrik Ingo OUGF Harmony conference ... # Here we increase window size for a WAN setup
In this technical presentation Alex Yurchenko, cluster-developer and expert, will cover: Synchronous multi-master features and functionality; Optimized WAN ...
1 Synchronous multi-master clusters with : ... * Load balancing and other options * How network partitioning is handled * WAN replication How does it perform?
Galera - Synchronous Multi-master Replication For ... Galera in cloud * WAN ... Synchronous multi-master clusters ...
Replication, Clustering, and Connection Pooling. ... that are synchronous, ... to support large multi-master clusters and single ...
Galera Synchronous Multi-Master Replication ... This is a tutorial about how to install, configure and operate Galera synchronous multi-master ... WAN ...